Okay, ну, in your case, the best thing to use would be Gemini (either Flash or the regular one, doesn’t really matter, they’re both pretty good versions). But the thinking model, the regular one, it can handle like 65k tokens/characters in its response, while the Flash and other models are more around 8k tokens.
With Gemini, the really cool thing is the “big context window.” It’s supposed to be around a million characters, BUT, once you hit about 30k tokens (for me it starts lagging around 200k), the chat can start to get a little slow for people. And I’m always using a PC, so I’m not sure how it is on phones.
Pretty much all other AIs have context windows around 32k characters. That means if you dump too much text in, they’ll start forgetting earlier parts of the conversation and mess things up because there’s just too much for them to remember.
As for running LLMs locally, yeah, you can, but you need a pretty beefy PC. Like, at least 8GB of video card memory. PLUS, they’re usually not as smart as the online AIs, and they also have that smaller 32k token context window.
So, what you probably need to do is split your text into chunks of about 8k tokens (that’s like roughly 8k characters or maybe 800 words), and then send each chunk to the AI with the task you want it to do.
Like, say:
Your request: Here’s a piece of my novel, about “900 words”, can you polish it up?
AI’s answer: Here’s the edited piece of your novel, “900 words, all polished up!”
Then you just copy and paste it, and do that chunk by chunk.
Ai translated*