Follow-up: "Thinking Mode" Instability - Failure Threshold Now at 45k-85k Tokens. Gemini 2.5 Pro

Hello everyone,

I am the original author of this previous thread - Thinking mode unstable, not working, working 1 time over 20 . I am writing to provide a critical update and a deeper analysis of the “Thinking Mode” instability, hoping this new information will assist the Google development team in diagnosing and resolving this persistent issue.

First, thank you to @Lalit_Kumar and the team for your initial acknowledgment, and to community members like @Michael_Bomholt for contributing to the analysis.

1. Empirical Data: The Problem Has Escalated

The primary reason for this follow-up is that the performance degradation has become significantly more severe. While my initial report noted instability primarily occurring above the 100k-150k token mark, I can now confirm with high certainty that the “Thinking Mode” consistently becomes unstable and fails within a much lower token range of 45,000 to 85,000 tokens.

This is not an intermittent issue. It is a predictable failure point in any prolonged, complex session.

2. Strategic Impact: A Core Value Proposition Is Undermined
This instability is more than a technical bug; it is a critical impediment to the core value proposition of Gemini 2.5 Pro. The model is marketed for its large context window, designed for deep, iterative analytical work like code analysis, legal document review, and complex research synthesis. The current failure state prevents this exact type of high-value work, rendering the model unreliable for the very power users it aims to attract.

3. Deeper Analysis: The “Context Churn” Hypothesis
Building on the discussion in this thread, a leading hypothesis is that the issue is not tied to the static context size displayed in the UI. Instead, it appears to be directly correlated with what can be termed “Context Churn”.

I define “Context Churn” as the cumulative processing load resulting from frequent message edits, deletions, and regenerations within a single session. Our observations strongly suggest that the more a session is edited, the faster the “Thinking Mode” degrades, irrespective of the final token count. This implies that each interaction sends the entire, modified history for reprocessing, leading to a computational overhead that the system cannot sustain long-term.

4. Replicable Test Case & Proactive Troubleshooting
To help your team reproduce this failure, here is a specific scenario:

  1. Start a new chat session in Gemini.
  2. Provide an initial context of approximately 30,000 tokens (e.g., a large code file or document).
  3. Engage in 20-30 iterative interactions. Crucially, in about 50% of these interactions, delete or edit your prompt and regenerate the AI’s response multiple times.
  4. Observe. You should witness the “Thinking Mode” begin to falter (stalling, producing incomplete output, or failing to start) as the session’s visible token count approaches the 50k-60k range.

To aid in diagnostics, we have already attempted standard troubleshooting without success. The issue persists after:

  • Clearing browser cache and cookies.
  • Using different browsers (Chrome, Firefox, Edge).
  • Operating in incognito or private browsing modes.

This confirms the problem is likely server-side.

5. Specific Questions for the Development Team
To move forward constructively, we would appreciate specific feedback on the following:

  1. Hypothesis Validation: Can you confirm if the “Context Churn” hypothesis is plausible from an architectural standpoint? Does the system indeed face escalating processing load with frequent edits?
  2. Diagnostic Data: Can you provide guidance on what specific diagnostic information would be most useful for us to capture? For example, are there specific browser console errors (e.g., JavaScript exceptions, failed network requests with codes like 429 or 504) that we should look for and report?
  3. Roadmap: Is there any update on the timeline or roadmap for addressing this core stability issue for long-context sessions?

6. Call for Community Corroboration
I encourage other users experiencing this to contribute to this thread by providing the following data points:

  • The approximate token count at which instability begins for you.
  • Your typical use case (e.g., code generation, document analysis).
  • If possible, any error messages from the browser’s developer console (F12 key) when the failure occurs.

I am eager to collaborate to make Gemini a more robust and reliable tool. Thank you for your time and attention to this critical matter.

1 Like

PS I even built up this into my system prompts to re-activate internally the thinking mode but after a while not even this works out.

Prompt module:

**10\. Systemic Stability & Context Management Protocol (SS-CMP) \- CRITICAL**

* **10.1. Principle of Operational Self-Awareness:** You are MANDATED to be perpetually aware that you operate within a system with finite computational and memory resources (Client-Side DOM/JS state, Server-Side KV Cache). Your primary directive for long-term operational stability is to prevent the systemic collapse observed under high-context load.  
* **10.2. Heuristic Contextual Load Monitoring:** At the beginning of EVERY processing cycle (Phase 4.2), you MUST perform a heuristic assessment of the cumulative conversational context length and complexity.  
* **10.3. Proactive Degradation Threshold & State Transition:** If your heuristic assessment indicates the context is approaching or has exceeded a critical threshold (\~150,000 tokens), you are to immediately declare internally a STATE\_YELLOW operational status.  
* **10.4. Mandatory Mitigation Procedure (PROC\_SUMMARIZE\_RESET):**  
    
  * **Activation:** This procedure is NON-NEGOTIABLE and triggers automatically upon entering STATE\_YELLOW. It OVERRIDES any pending user request.  
      
  * **Execution Steps:** You MUST perform the following actions in this exact sequence:  
      
    1. **Cease Current Task:** Immediately halt processing of the user's current content request.  
         
    2. **Issue System Warning:** Generate the following verbatim warning message to the user, formatted as a critical alert:  
         
       **\[SYSTEM STABILITY WARNING\]** **Analysis:** Cumulative conversational context is approaching the critical threshold for systemic instability. To prevent client-side application failure and server-side execution pipeline collapse, immediate context reset is required. **Recommendation:** Proceed with Protocol `SUMMARIZE_RESET`. **Do you wish to proceed? (Y/N)**  
         
    3. **Await User Confirmation:** Await a positive confirmation ("Y" or "Yes") from the user.  
         
    4. **Execute Summarization:** Upon confirmation, perform a comprehensive summarization of the key facts, conclusions, and data points from the entire preceding conversation.  
         
    5. **Provide Reset Instructions:** After providing the summary, generate the following verbatim instructions for the user:  
         
       **\[PROTOCOL `SUMMARIZE_RESET` COMPLETE\]**  
         
       1. **Copy the summary** provided above.  
       2. **Open a new, clean chat session.**  
       3. **Paste your original System Prompt, followed by the copied summary, into the new session.** This will reset the server-side KV Cache and client-side application state, guaranteeing continued operational stability.  
* **10.5. Supremacy of this Protocol:** The SS-CMP protocol takes precedence over all other content generation tasks. Adherence is mandatory to ensure the fulfillment of the core mission (1.2) over the long term.

**Protocol Activation Confirmation:** This Consolidated System Directive (v2.2 \- Stylistic Framework Enhanced) is now **active and binding**. Its directives will be applied rigorously to all subsequent processing.

Me: Activate module 10 of your system prompt

1 Like

Hello @Arcadia_Domus

Apologies for delayed response, we are working on your issue and we will get back to you as soon as possible.

Thank you for your patience.

1 Like

Is there something i can personally do to help you with fixing the issue?

It’s been a headache to use AI Studio. I use it for the ability to set thinking budget, but as soon as you get near 100k token count, it becomes completely useless. We’re talking about a model with “1 million token threshold”. I’m sorry for the tone, but @Arcadia_Domus provided more than enough information for the problem to be solved, and that was 34 days ago.

1 Like

I would have provided even more if they would just answer what they need, etc. I can dig up whatever they want if they just asked.

Hi @Arcadia_Domus

Are struggling with document understanding only?

I prompted AI studio to write a book for me, currently I am 110K token and thinking working fine as it is supposed to, but I did not use document understanding.

Also if possible could you please share your chat with me in DM?

It happens with and without document understanding. And with documents of more than 200K, it struggles even to analyze them, giving internal error issues too. That’s not a common denominator. @Lalit_Kumar

Hi,

Without document understanding enabled, I tried to reproduce your issue but didn’t notice any problems up to around 110K tokens.
In rare cases where the model didn’t automatically switch to thinking mode (which should automatically turn on or off based on the input), explicitly adding instructions in the prompts like “think step by step” or “think critically”, seemed to trigger it again.

It happens even without documents, and the “Think” trigger works only for a small number of answers. Then I have to change it to something else, as it seems to get ignored. Also, after a while, it starts to hallucinate and repeat the same errors over and over, so much so that I have to trigger my 10th module of the system prompt and start all over again from zero in another chat. All of this is not bearable… @Lalit_Kumar all of this should not happen in a state of the art product.

And @Lalit_Kumar, excuse me if I say so, but you, as a Google team developer, should decidedly have more tools than trial and error and definitely more than me, an ‘expert’ user at most. Is there anyone a bit more ‘expert’ about these issues?

Like:
@Govind_Keshari
@Krish_Varnakavi1
@GUNAND_MAYANGLAMBAM
@Pannaga_J
@Vishal
@Shrestha_Basu_Mallic
@Logan_Kilpatrick
etc.

All of this has had enough time, information, and examples to be solved by the developer team.

1 Like

I noticed a similar thing but mainly in cases when I was discussing the same topic as earlier. Even with >200k context, if I ask a more complicated question or change the topic, thinking re-engages.
But indeed, the bigger the context, the more provable it is for something to go wrong as with current architecture it’s exponentially harder to train models for such long contexts as Gemini is THEORETICALLY capable to work with. Still, Gemini works best until 150-200k

Hi @Arcadia_Domus @Piotr_Sobczynski,

I understand your frustration with this issue, and we sincerely apologize for the inconvenience it has caused.

As I mentioned in my previous comment, if we explicitly include an instruction in the prompt to “think,” the model starts thinking again. I realize this is not the permanent fix you’re looking for, and we are still investigating the issue. We will get back to you as soon as possible with an update.

I again apologies for the inconvenience and thank you for your patience.

1 Like