Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0

The document above clearly states that setting thinking_budget=0 disables thinking.

However, I’m not actually able to consistently disable thinking with gemini-2.5-flash-preview-04-17.

I’m using the Python google-genai 1.11.0 package.

The calling code looks something like this:

response = self.client.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=contents,
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=res_schema,
        max_output_tokens=max_output_tokens,
        thinking_config=types.ThinkingConfig(thinking_budget=0),
        temperature=0.0,
    ),

However, the response, for example, includes thoughts_token_count=112, with a clearly higher latency than when it does not think.

It does not happen consistently and seems to depend on contents as well as some randomness.

Here is a full Python script that can reproduce this problem:

from google import genai
from google.genai import types

res = genai.Client().models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=[
        "Translate the following text to English.",
        "『ディシディア ファイナルファンタジー』(DISSIDIA FINAL FANTASY)は、スクウェア・エニックスより2008年に発売されたPSP専用のコンピュータゲームである。",
    ],
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "object",
            "properties": {
                "translation": {"type": "STRING"},
            },
            "required": ["translation"],
        },
        thinking_config=types.ThinkingConfig(thinking_budget=0),
        temperature=0.0,
    ),
)

print(res.text)
print(res.usage_metadata)

Output:

{"translation": "\"DISSIDIA FINAL FANTASY\" is a computer game exclusively for the PSP, released by Square Enix in 2008."}
cache_tokens_details=None cached_content_token_count=None candidates_token_count=141 candidates_tokens_details=None prompt_token_count=52 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=52)] thoughts_token_count=107 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=193 traffic_type=None

As you can see, despite thinking_config=types.ThinkingConfig(thinking_budget=0), thinking was triggered (thoughts_token_count=107).

I have same issue

It’s still preview “Modified by moderator”

In some rare cases, the model still thinks a little even with thinking budget = 0, we are hoping to fix this before we make this model stable and you won’t be billed for thinking. The thinking budget = 0 is what triggers the billing switch.

“Modified by moderator”

Hi @koa,

Welcome to forum, I am trying to replicate the issue but I was not able to reproduce as per your provided code snippet. Is the issue still persists from your side?

Thank you.

Thanks for testing it out for me. I can confirm that the issue is no longer present for me either. It seems that the issue has been fixed.

1 Like