Gemini-2.5-flash-preview-09-2025 breaks the thinking_budget parameter

Joe1 · September 25, 2025, 6:51pm

With gemini-2.5-flash, in the python SDK, I can use this thinking config to disable thinking tokens:

thinking_config=ThinkingConfig(
thinking_budget=0
)

With gemini-2.5-flash-preview-09-2025 Setting that has no impact. It still includes thinking tokens in the output. For example, in the returned usage_metadata it gave:

thoughts_token_count=15226 despite budget being set to 0.

This does not occur with Flash 05-20, where it correctly returns 0 thinking tokens. The net result is, Im being charged for thousands of extra tokens, and extra latency, for tokens that I explicitly requested not to receive.

chunduriv · September 25, 2025, 11:09pm

Hi @Joe1,

Thanks for sharing! I’m trying to reproduce the issue where thinking_budget=0 isn’t being respected by the gemini-2.5-flash-preview-09-2025 model. My tests with simpler prompts are showing the expected behavior, where thoughts_token_count=None confirms the budget is being respected, as shown below

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-09-2025",
    contents="Explain the concept of Occam's Razor and provide a simple, everyday example.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=0) # Disables thinking
    ),

print(response.usage_metadata)

Output:

cache_tokens_details=None cached_content_token_count=None candidates_token_count=645 candidates_tokens_details=None prompt_token_count=18 prompt_tokens_details=[ModalityTokenCount(
  modality=<MediaModality.TEXT: 'TEXT'>,
  token_count=18
)] thoughts_token_count=None tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=663 traffic_type=None

To help me debug this further, could you please share the exact prompt and any other relevant configuration you used? This will help me understand if specific prompt complexities are triggering the unexpected behavior.

Thank you!

Joe1 · October 1, 2025, 2:52pm

Here is an example prompt that reproduces this issue:

Task: Translate this text to native fluent English.
        
Text: Yardım bile ediyor bana küçük hanım.

Generation config:

client.models.generate_content(
    contents=[prompt],
    model="gemini-2.5-flash-preview-09-2025",
    config=GenerateContentConfig(
        response_mime_type="application/json",
        thinking_config=ThinkingConfig(thinking_budget=0),
    ),
)

Response usage.metadata

cache_tokens_details=None cached_content_token_count=None candidates_token_count=9 candidates_tokens_details=None prompt_token_count=29 prompt_tokens_details=[ModalityTokenCount(
  modality=<MediaModality.TEXT: 'TEXT'>,
  token_count=29
)] thoughts_token_count=478 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=516 traffic_type=None

Interestingly, removing response_mime_type="application/json", resolves the issue, and the model consistently outputs 0 thinking tokens. But I need a json response, since I use structured outputs. gemini-2.5-flash-preview-05-20 does not exhibit this issue.

Joe1 · October 21, 2025, 7:27pm

Is there any plan to fix this? It’s a clear regression.

Topic		Replies	Views
Thinking in flash-preview-09-2025 Gemini API gemini-flash	7	169	January 5, 2026
Latest @google/genai with 2.5 flash ignoring thinking budget Gemini API generative-ai , gemini-flash	11	646	December 2, 2025
Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0 Gemini API help_request	5	1734	April 22, 2025
Gemini 2.5 Flash Overthinking by a lot Gemini API prompt , gemini-2	6	645	September 5, 2025
Big Problem! Gemini 3.0 pro preview thought token exceeding problem Gemini API bug , api , gemini , thinking	8	748	March 29, 2026

Gemini-2.5-flash-preview-09-2025 breaks the thinking_budget parameter

Related topics