Gemini 2.5 Pro Exposing "Silent Thought" Process in Long Context Conversations

Hello everyone,

Our team is developing a virtual character chat product, and we are currently using the Gemini 2.5 Pro model.

We chose the Pro model because, although it is generally recommended for reasoning tasks rather than role-playing, our use case involves numerous and strict constraints for the virtual character. In our tests, only the Pro model, with its robust long-context capabilities, can consistently adhere to these guidelines.

To implement short-term memory for the character, we manually control and manage the conversation context, currently setting it to a length of approximately 150 conversation turns.

The core issue we are encountering is that when the number of conversation turns with an end-user grows and approaches our 150-turn limit, the model frequently includes its internal “silent thought” process in the final answer before providing the actual, in-character response that we need. This thinking process should not be visible to the user.

Notably, we have never observed a similar issue in shorter user sessions.

Therefore, I’d like to ask for your insights:

Is this phenomenon likely caused by the context window becoming too long, leading the LLM to fail to strictly adhere to the instructions and character constraints we have defined in the System Prompt (SP)? 6 Or are there other potential causes we might be overlooking?

Has anyone else encountered a similar situation, or do you have any suggestions for potential solutions or debugging strategies?

Thank you very much for your help

I can only assume that this is because somewhere there are thoughts of the character that and decided to repeat, perhaps in the hint there is an example or instruction how to think

From experience, I wanted to influence the thought process by including an example, but it led to reflections in the answer itself, gemini 2.5 pro

Thanks so much for your reply, it offers some great directions for troubleshooting.

Building on your idea, I’d like to provide more detail about our current situation. In the cases where we see these abnormal responses, the generated content from Gemini 2.5 Pro follows a distinct structure:

  1. It begins by understanding the user’s input in the context of our conversation.

  2. Then, it clarifies the specific task assigned to its character persona.

  3. It proceeds to generate three potential replies.

  4. Next, it translates these replies into the appropriate language.

  5. After that, it evaluates these multi-choice responses.

  6. It performs a final review against its core principles.

  7. Finally, it decides on the ultimate response to send.

To be perfectly clear, this entire sequence is the “silent thought” process that gets exposed. We expect and, in normal situations, only receive the final response. However, in these long-context scenarios, Gemini is outputting the entire thought process that leads up to the final answer as the complete Answer.

Regarding your suggestion about examples in the prompt, our prompt focuses mainly on core task requirements, character profile information, and output formatting principles. It does not contain any examples intended to guide or demonstrate a thinking process. I suspect that Gemini is outputting its thinking process into the Answer field .

It’s also worth noting that we have not yet configured a “thinking budget” for Gemini 2.5 Pro.