Gemini-2.5-flash-preview-09-2025 breaks the thinking_budget parameter

Joe1 · October 1, 2025, 2:52pm

Here is an example prompt that reproduces this issue:

Task: Translate this text to native fluent English.
        
Text: Yardım bile ediyor bana küçük hanım.

Generation config:

client.models.generate_content(
    contents=[prompt],
    model="gemini-2.5-flash-preview-09-2025",
    config=GenerateContentConfig(
        response_mime_type="application/json",
        thinking_config=ThinkingConfig(thinking_budget=0),
    ),
)

Response usage.metadata

cache_tokens_details=None cached_content_token_count=None candidates_token_count=9 candidates_tokens_details=None prompt_token_count=29 prompt_tokens_details=[ModalityTokenCount(
  modality=<MediaModality.TEXT: 'TEXT'>,
  token_count=29
)] thoughts_token_count=478 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=516 traffic_type=None

Interestingly, removing response_mime_type="application/json", resolves the issue, and the model consistently outputs 0 thinking tokens. But I need a json response, since I use structured outputs. gemini-2.5-flash-preview-05-20 does not exhibit this issue.

Topic		Replies	Views
Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0 Gemini API help_request	5	1466	April 22, 2025
Latest @google/genai with 2.5 flash ignoring thinking budget Gemini API generative-ai , gemini-flash	10	249	October 10, 2025
`max_output_tokens` isn't respected when using `gemini-2.5-flash` model Gemini API bug	7	156	October 4, 2025
Gemini 2.5 Flash Overthinking by a lot Gemini API prompt , gemini-2	6	336	September 5, 2025
Gemini 2.5 Flash Thinking Tokens using OpenAI API Gemini API help_request	16	1312	June 12, 2025

Gemini-2.5-flash-preview-09-2025 breaks the thinking_budget parameter

Related topics