Gemini-2.5-flash-preview-09-2025 breaks the thinking_budget parameter

With gemini-2.5-flash, in the python SDK, I can use this thinking config to disable thinking tokens:

thinking_config=ThinkingConfig(
thinking_budget=0
)

With gemini-2.5-flash-preview-09-2025 Setting that has no impact. It still includes thinking tokens in the output. For example, in the returned usage_metadata it gave:

thoughts_token_count=15226 despite budget being set to 0.

This does not occur with Flash 05-20, where it correctly returns 0 thinking tokens. The net result is, Im being charged for thousands of extra tokens, and extra latency, for tokens that I explicitly requested not to receive.

Hi @Joe1,

Thanks for sharing! I’m trying to reproduce the issue where thinking_budget=0 isn’t being respected by the gemini-2.5-flash-preview-09-2025 model. My tests with simpler prompts are showing the expected behavior, where thoughts_token_count=None confirms the budget is being respected, as shown below

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-09-2025",
    contents="Explain the concept of Occam's Razor and provide a simple, everyday example.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=0) # Disables thinking
    ),

print(response.usage_metadata)

Output:

cache_tokens_details=None cached_content_token_count=None candidates_token_count=645 candidates_tokens_details=None prompt_token_count=18 prompt_tokens_details=[ModalityTokenCount(
  modality=<MediaModality.TEXT: 'TEXT'>,
  token_count=18
)] thoughts_token_count=None tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=663 traffic_type=None

To help me debug this further, could you please share the exact prompt and any other relevant configuration you used? This will help me understand if specific prompt complexities are triggering the unexpected behavior.

Thank you!

Here is an example prompt that reproduces this issue:

Task: Translate this text to native fluent English.
        
Text: Yardım bile ediyor bana küçük hanım.

Generation config:

client.models.generate_content(
    contents=[prompt],
    model="gemini-2.5-flash-preview-09-2025",
    config=GenerateContentConfig(
        response_mime_type="application/json",
        thinking_config=ThinkingConfig(thinking_budget=0),
    ),
)

Response usage.metadata

cache_tokens_details=None cached_content_token_count=None candidates_token_count=9 candidates_tokens_details=None prompt_token_count=29 prompt_tokens_details=[ModalityTokenCount(
  modality=<MediaModality.TEXT: 'TEXT'>,
  token_count=29
)] thoughts_token_count=478 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=516 traffic_type=None

Interestingly, removing response_mime_type="application/json", resolves the issue, and the model consistently outputs 0 thinking tokens. But I need a json response, since I use structured outputs. gemini-2.5-flash-preview-05-20 does not exhibit this issue.

Is there any plan to fix this? It’s a clear regression.