Gemini 3.1 pro context window not realistic

Gemini 3.1 pro preview 1 million tokens context window is a lie. The model starts making lots more errors already past 200k tokens. After 500-600k tokens, it becomes so prone to mistakes that it’s essentially unusable for anything valuable. If you ever get to 800-900k tokens you’ll see that it doesn’t even matter what you write in the prompt cause the model will just hallucinate some random stuff from the past messages 3 times out of 4, it’s like throwing a dice. For a sota model like this, it’s quite a big issue and quite significant that people at google felt the need to lie to users about its actual context window performance. I understand that AI is new tech and that it’s really hard, but I expected a lot more from a company like Google.

Hello @BReal , can you share a bit more context to help us debug this issue? Specifically, sharing the exact prompt you used and your model configurations (like Temperature, Thinking Level, etc.) would be helpful

You can just use the model and you will see that the 1 million context window is a marketing lie. It can be reproduced under any temperature setting and with most types of prompts. Plenty of other users confirm this on the internet, it can be easily checked.
I use the model for coding purposes, and when the context window grows the model gets sloppier by a lot. The model gets confused easily between different iterations of the same file easily and forgets what it already outputted.
For example it happens very frequently to ask the model for a new task, and it will hallucinate and provide the code for the old task that it already completed earlier.

Any human that has eyes and fingers to type can easily reproduce the issue of the model getting extremely worse when context window grows and when iterating on the same code multiple times.

In my personal testing this happens both with low temperature values (anything under 1.0) and also with normal temperature value (which is 1.0). I have never tried using the model with higher temperature because for coding tasks it’s already quite unusable with 1.0 temperature, let alone increasing it.

Thinking level always set to HIGH because I need it for complex tasks