Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0

koa · April 18, 2025, 9:42pm

The document above clearly states that setting thinking_budget=0 disables thinking.

However, I’m not actually able to consistently disable thinking with gemini-2.5-flash-preview-04-17.

I’m using the Python google-genai 1.11.0 package.

The calling code looks something like this:

response = self.client.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=contents,
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=res_schema,
        max_output_tokens=max_output_tokens,
        thinking_config=types.ThinkingConfig(thinking_budget=0),
        temperature=0.0,
    ),

However, the response, for example, includes thoughts_token_count=112, with a clearly higher latency than when it does not think.

It does not happen consistently and seems to depend on contents as well as some randomness.

koa · April 18, 2025, 10:20pm

Here is a full Python script that can reproduce this problem:

from google import genai
from google.genai import types

res = genai.Client().models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=[
        "Translate the following text to English.",
        "『ディシディア ファイナルファンタジー』(DISSIDIA FINAL FANTASY)は、スクウェア・エニックスより2008年に発売されたPSP専用のコンピュータゲームである。",
    ],
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "object",
            "properties": {
                "translation": {"type": "STRING"},
            },
            "required": ["translation"],
        },
        thinking_config=types.ThinkingConfig(thinking_budget=0),
        temperature=0.0,
    ),
)

print(res.text)
print(res.usage_metadata)

Output:

{"translation": "\"DISSIDIA FINAL FANTASY\" is a computer game exclusively for the PSP, released by Square Enix in 2008."}
cache_tokens_details=None cached_content_token_count=None candidates_token_count=141 candidates_tokens_details=None prompt_token_count=52 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=52)] thoughts_token_count=107 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=193 traffic_type=None

As you can see, despite thinking_config=types.ThinkingConfig(thinking_budget=0), thinking was triggered (thoughts_token_count=107).

Basel · April 18, 2025, 10:50pm

I have same issue

Geminooo · April 21, 2025, 2:59am

It’s still preview “Modified by moderator”

In some rare cases, the model still thinks a little even with thinking budget = 0, we are hoping to fix this before we make this model stable and you won’t be billed for thinking. The thinking budget = 0 is what triggers the billing switch.

“Modified by moderator”

Siva_Sravana_Kumar_N · April 21, 2025, 9:42pm

Hi @koa,

Welcome to forum, I am trying to replicate the issue but I was not able to reproduce as per your provided code snippet. Is the issue still persists from your side?

Thank you.

koa · April 22, 2025, 12:13am

Thanks for testing it out for me. I can confirm that the issue is no longer present for me either. It seems that the issue has been fixed.

Topic		Replies	Views
How To disable Thinking using Gemini 2.5 Flash? thinkingBudget: 0 not working Gemini API help_request , gemini-flash	1	1007	April 23, 2025
Gemini 2.5 Flash Thinking Tokens using OpenAI API Gemini API help_request	16	1028	June 12, 2025
How to Reduce Thought Reasoning in Gemini 2.5 Pro Gemini API api , models	7	1173	June 9, 2025
Thinking Tokens Counted, but Billed as Non-Thinking Gemini API api , billing	1	178	April 24, 2025
Gemini 2.5 Pro, Thinking and Non-thinking Google AI Studio models , gemini-20	6	2063	June 19, 2025

Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0

Related topics