Hello everyone,
I am the original author of this previous thread - Thinking mode unstable, not working, working 1 time over 20 . I am writing to provide a critical update and a deeper analysis of the “Thinking Mode” instability, hoping this new information will assist the Google development team in diagnosing and resolving this persistent issue.
First, thank you to @Lalit_Kumar and the team for your initial acknowledgment, and to community members like @Michael_Bomholt for contributing to the analysis.
1. Empirical Data: The Problem Has Escalated
The primary reason for this follow-up is that the performance degradation has become significantly more severe. While my initial report noted instability primarily occurring above the 100k-150k token mark, I can now confirm with high certainty that the “Thinking Mode” consistently becomes unstable and fails within a much lower token range of 45,000 to 85,000 tokens.
This is not an intermittent issue. It is a predictable failure point in any prolonged, complex session.
2. Strategic Impact: A Core Value Proposition Is Undermined
This instability is more than a technical bug; it is a critical impediment to the core value proposition of Gemini 2.5 Pro. The model is marketed for its large context window, designed for deep, iterative analytical work like code analysis, legal document review, and complex research synthesis. The current failure state prevents this exact type of high-value work, rendering the model unreliable for the very power users it aims to attract.
3. Deeper Analysis: The “Context Churn” Hypothesis
Building on the discussion in this thread, a leading hypothesis is that the issue is not tied to the static context size displayed in the UI. Instead, it appears to be directly correlated with what can be termed “Context Churn”.
I define “Context Churn” as the cumulative processing load resulting from frequent message edits, deletions, and regenerations within a single session. Our observations strongly suggest that the more a session is edited, the faster the “Thinking Mode” degrades, irrespective of the final token count. This implies that each interaction sends the entire, modified history for reprocessing, leading to a computational overhead that the system cannot sustain long-term.
4. Replicable Test Case & Proactive Troubleshooting
To help your team reproduce this failure, here is a specific scenario:
- Start a new chat session in Gemini.
- Provide an initial context of approximately 30,000 tokens (e.g., a large code file or document).
- Engage in 20-30 iterative interactions. Crucially, in about 50% of these interactions, delete or edit your prompt and regenerate the AI’s response multiple times.
- Observe. You should witness the “Thinking Mode” begin to falter (stalling, producing incomplete output, or failing to start) as the session’s visible token count approaches the 50k-60k range.
To aid in diagnostics, we have already attempted standard troubleshooting without success. The issue persists after:
- Clearing browser cache and cookies.
- Using different browsers (Chrome, Firefox, Edge).
- Operating in incognito or private browsing modes.
This confirms the problem is likely server-side.
5. Specific Questions for the Development Team
To move forward constructively, we would appreciate specific feedback on the following:
- Hypothesis Validation: Can you confirm if the “Context Churn” hypothesis is plausible from an architectural standpoint? Does the system indeed face escalating processing load with frequent edits?
- Diagnostic Data: Can you provide guidance on what specific diagnostic information would be most useful for us to capture? For example, are there specific browser console errors (e.g., JavaScript exceptions, failed network requests with codes like 429 or 504) that we should look for and report?
- Roadmap: Is there any update on the timeline or roadmap for addressing this core stability issue for long-context sessions?
6. Call for Community Corroboration
I encourage other users experiencing this to contribute to this thread by providing the following data points:
- The approximate token count at which instability begins for you.
- Your typical use case (e.g., code generation, document analysis).
- If possible, any error messages from the browser’s developer console (F12 key) when the failure occurs.
I am eager to collaborate to make Gemini a more robust and reliable tool. Thank you for your time and attention to this critical matter.