Thinking output on gemini-live-2.5-flash-preview model

Sometimes gemini-live-2.5-flash-previewmodel with live connection responds with text similar to:

<thinking>The user confirmed their location. This means the user is on time and there are no delays. Therefore, I should call the `save_details` function with the collected information. The `result` should be 'ON_TIME', and `potential_delays` should be 'None'.</thinking>

Code looks like this:

        client = genai.Client(
            api_key=self.api_key,
        )

        model = "gemini-live-2.5-flash-preview"
        temperature = 0.2
        seed = 51
        top_predictions = 0.7

        config = types.LiveConnectConfig(
            response_modalities=[types.Modality.TEXT],
            output_audio_transcription=types.AudioTranscriptionConfig(),
            input_audio_transcription=types.AudioTranscriptionConfig(),
            system_instruction=self.config_manager.get_system_instruction(
                voice_name=self.voice_name,
            ),
            tools=[types.Tool(
                function_declarations=self.parse_functions(),
            )],
            thinking_config=types.ThinkingConfig(
                include_thoughts=False,
                thinking_budget=0,
            ),
            seed=seed,
            temperature=temperature,
            top_p=top_predictions,
            max_output_tokens=300,
        )

No matter how much I try to tell in system_instruction to avoid outputting thoughts or stop thinking it doesn’t stops randomly doing that, can anybody from Google tell me how to avoid having this, or maybe if it is a bug, help solving it?

I know that there is new model for Live API gemini-2.5-flash-native-audio-preview-09-2025 but problem with using it is that in the scenario I’m trying to achieve there is only requirement to pass audio speech and receive text responses, which isn’t support by that model, and with above config I’m getting error:

Cannot extract voices from a non-audio request.; then sent 1007 (invalid frame payload data) Cannot extract voices from a non-audio request.

Have you found a solution?