Thinking output on gemini-live-2.5-flash-preview model

Sometimes gemini-live-2.5-flash-previewmodel with live connection responds with text similar to:

<thinking>The user confirmed their location. This means the user is on time and there are no delays. Therefore, I should call the `save_details` function with the collected information. The `result` should be 'ON_TIME', and `potential_delays` should be 'None'.</thinking>

Code looks like this:

        client = genai.Client(
            api_key=self.api_key,
        )

        model = "gemini-live-2.5-flash-preview"
        temperature = 0.2
        seed = 51
        top_predictions = 0.7

        config = types.LiveConnectConfig(
            response_modalities=[types.Modality.TEXT],
            output_audio_transcription=types.AudioTranscriptionConfig(),
            input_audio_transcription=types.AudioTranscriptionConfig(),
            system_instruction=self.config_manager.get_system_instruction(
                voice_name=self.voice_name,
            ),
            tools=[types.Tool(
                function_declarations=self.parse_functions(),
            )],
            thinking_config=types.ThinkingConfig(
                include_thoughts=False,
                thinking_budget=0,
            ),
            seed=seed,
            temperature=temperature,
            top_p=top_predictions,
            max_output_tokens=300,
        )

No matter how much I try to tell in system_instruction to avoid outputting thoughts or stop thinking it doesn’t stops randomly doing that, can anybody from Google tell me how to avoid having this, or maybe if it is a bug, help solving it?

I know that there is new model for Live API gemini-2.5-flash-native-audio-preview-09-2025 but problem with using it is that in the scenario I’m trying to achieve there is only requirement to pass audio speech and receive text responses, which isn’t support by that model, and with above config I’m getting error:

Cannot extract voices from a non-audio request.; then sent 1007 (invalid frame payload data) Cannot extract voices from a non-audio request.

Have you found a solution?

Hi @Kirill_Morozov, Welcome to AI Forum,
I successfully ran the test to reproduce the unexpected <thinking> output you’re encountering.
I observed that with a configuration very similar to yours, I was not able to get the <thinking> </thinking> text in the response, suggesting that the issue might be intermittent or related to a specific part of your environment.
For you to re-test, I’ve attached the code snippet I used. I made a few minor stability modifications, specifically:

  1. Reinforcing the System Instruction to aggressively prohibit the <thinking> tag.
  2. Increasing max_output_tokens to ensure the stream completes without hanging
config = types.LiveConnectConfig(
            response_modalities=[types.Modality.TEXT],
            output_audio_transcription=types.AudioTranscriptionConfig(),
            input_audio_transcription=types.AudioTranscriptionConfig(),
            system_instruction=(
                "You are an assistant determining if the user is on time and calling `save_details`. "
            "CRITICAL: DO NOT output any text enclosed in <thinking> or <tool_use> tags. "
            "Respond directly with the final function call or a brief conversational wrap-up."
            ),
            tools=[types.Tool(
                function_declarations=self.parse_functions(), 
            )],
            thinking_config=types.ThinkingConfig(
                include_thoughts=False,
                thinking_budget=0,      
            ),
            seed=self.seed,
            temperature=self.temperature,
            top_p=self.top_predictions,
           max_output_tokens=1024, 
        )

Could you try running your code again with these small adjustments and let us know if you are still facing an issue?
Note :- As the above model will soon be deprecated, I suggest to use our latest stable models https://ai.google.dev/gemini-api/docs/models