I normally utilize huge context windows since it’s something I required in my work like scanning big PDFs, codes, and etc. Recently I noticed that both Gemini 2.0 Flash Thinking and Gemini 1206 hallucinates too much after crossing 200k tokens context, especially Gemini 2.0 Flash Thinking. So, am I the only one getting this much hallucinations or it’s for everybody?
I want to say that I’ve consistently encountered this issue, especially after 128k context, which is a drawback of many existing long-text models. Gemini 1206 performs better than Gemini Thinking, with fewer hallucinations. If you need solutions, my suggestions are:
Use Claude 3.5 Sonnet - it has very low hallucination rates with long texts, even though it only has 200k context. The second suggestion is when using Gemini, break down your ‘tasks’ or ‘instructions’ into several subtasks. This way, the model will hallucinate less because it’s difficult for models to simultaneously achieve high accuracy in both ‘needle in a haystack’ searches and complex reasoning. So if you reduce the difficulty of single queries, in my experience, hallucinations decrease. Also, remember that Gemini 2.0 Flash Thinking is still a Flash-level model. For complex tasks, especially those requiring contextual understanding with minimal hallucinations, Gemini 1206 remains state-of-the-art (SOTA).
happens to me too, i ask gemini to save work and structure new prompt and start new conversation
I have this problem too. I try to use G2FE to analyze and improve long documents, and before long it is confusing things it said before or complete hallucinations for material in the text under discussion.