Gemini 2.5 Flash Overthinking by a lot

ycastor · July 3, 2025, 11:35am

We are having a quite annoying issue with Gemini 2.5 Flash thinking mode, it overshoots the thinking budget by A LOT, eating pretty much the whole output length. If happens like this:

We set the thinkingBudget to a value like 1512 and the maxOutputTokens to 4000, most of the time it keeps between 1300~2100, which is ok, doable. But about in 20% of the calls the thinking just explodes to 3000~3500, leaving almost no output for the task itself. So we increased the maxOutput to 6000, and the same issue prevails, but now overshooting to 5000~5500.

If you run the same prompt, sometimes it happens, sometimes it does not, even with a very low temperature (0.02). I know the thinkingBudget is more of a “suggestion”, but overshoot by 3x is kind of annoying.

Is this a known issue?

Lalit_Kumar · July 4, 2025, 7:40am

Hello,

Welcome to the Forum!!

Have you tried specifying the thinking budget using the following code block:

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Provide a list of 3 famous physicists and their key contributions",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=1024)
        # Turn off thinking:
        # thinking_config=types.ThinkingConfig(thinking_budget=0)
        # Turn on dynamic thinking:
        # thinking_config=types.ThinkingConfig(thinking_budget=-1)
    ),
)

print(response.text)

We tried recreating your issue but observed that the thinking token count remains below specified limits.

For more detail, you can check Gemini API documentation.

ycastor · July 7, 2025, 11:31am

Hello Lalit!

Yes, we are using the thinkingBudget correctly, we noticed that if we put a value like 1024, it tends to overflow less. But when we put per example, 1512~2048 it often explodes.

Lalit_Kumar · July 8, 2025, 5:04am

Hi,

We tried to run Gemini flash 2.5 with maxOutputTokens = 4000 and thinking_budget =1600, temperature = 0.8, we ran it 10 times and observed that the thought token count remained within the specified limit.

To reproduce your issue would you be able to share your code and prompt with us?

Stefan_Adelbert · July 8, 2025, 6:31am

I have noticed the same behaviour. This issue has been raised with evidence at the ADK github issues list (Thinking config for 2.5 models · Issue #1018 · google/adk-python · GitHub).

Lalit_Kumar · July 28, 2025, 10:33am

Hi,

Could you please provide your code and exact prompt along with your configuration. So that I can reproduce your issue?

cor · September 5, 2025, 2:45pm

I am noticing the same issue and just made a post and shared my steps here

Topic		Replies	Views
Latest @google/genai with 2.5 flash ignoring thinking budget Gemini API generative-ai , gemini-flash	10	283	October 10, 2025
Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0 Gemini API help_request	5	1481	April 22, 2025
Gemini 2.5 Flash problems while trying to deactivate thinking Gemini API models , thinking	3	292	August 18, 2025
`max_output_tokens` isn't respected when using `gemini-2.5-flash` model Gemini API bug	7	218	October 4, 2025
How To disable Thinking using Gemini 2.5 Flash? thinkingBudget: 0 not working Gemini API help_request , gemini-flash	1	1659	April 23, 2025

Gemini 2.5 Flash Overthinking by a lot

Related topics