Alright, I’ll just describe what I did and what resulted. I wasn’t able to solve the issue or pinpoint the exact cause, but perhaps someone else might find something useful here.
Here’s the error: when making a POST request to GenerateContent, I receive the following response: “The input token count exceeds the maximum number of tokens allowed (1048576).” This is what’s later displayed in the UI as “An internal error has occurred.”
However, the actual token count in my chat is only 800k out of the allowed 1,000,000+.
Next, I used “Get SDK code to chat with Gemini”. After cleaning it from all unnecessary symbols ({}`'][*#, etc.) and executable code, I ended up with a clean Q&A text of the chat.
When I input this entire cleaned text into a new chat, the error reappears, as expected. But I found an exact token boundary for the error: at 741,126 tokens everything works fine, but at 741,127 tokens, it fails.
Interestingly, if I take a book, say “The Count of Monte Cristo,” and paste it entirely into the chat with a simple ctrl+c, ctrl+v, the model easily handles and responds even when the context exceeds 900k tokens.
Moreover, if I fill the initial message with repeated ‘A’ characters, the token limit significantly decreases. I tested only down to around 600 tokens because the browser tab’s memory usage became extreme (up to 5GB), causing severe UI lagging. Reducing the number of characters didn’t substantially alleviate this issue, and the tests became very time-consuming to perform.
Therefore, I still haven’t identified precisely why or how this specific issue arises. The limit of 741,126 tokens only seems to trigger on my specific unique text. With other texts, either no limit or a higher limit applies, as seen with “The Count of Monte Cristo,” where I stopped testing due to fatigue.
I’m not sure if this relates to the number of messages or not. In the chat with the 741,126 token limit, there were about 4 messages from me and roughly 8 from the model, as it responded multiple times to one large message. I chose not to test the scenario of reaching exactly 741,126 tokens in a single initial message, as that again creates severe browser performance issues with processing such large input.