Latest @google/genai with 2.5 flash ignoring thinking budget

I am asking gemini 2.5 flash with the latest genai js sdk to extract information from a png image, I call the generateContent command and pass a config like so

config: {
            systemInstruction: `look at the picture and extract the parts`,
            temperature: 0,
            thinkingConfig: {
              thinkingBudget: 4096,
            },
            maxOutputTokens: 8096
}

I also pass a json schema as well. The problem is the model seems to be ignoring the thinking budget and having “runaway thoughts” if i increase the max tokens it will think until it hits the max tokens.

usageMetadata: {
[0] promptTokenCount: 1886,
[0] totalTokenCount: 9981,
[0] trafficType: ‘ON_DEMAND’,
[0] promptTokensDetails: [ [Object], [Object] ],
[0] thoughtsTokenCount: 8095
[0] }

EDIT:
I just removed the json schema from the call and it seems to respect the thinking budget now. so it appears that is the problem. Also if i reduce the thinkingbudget to like 1024 then it seems to respect it more often.

1 Like

This is a pretty bad bug that almost makes gemini unusable with any complex thinking where you need an output format. When will this be fixed?

1 Like

Can someone from google please look at this?

Really shocked google hasnt even responded or acknowledged this issue. it makes thinking unusable with json schemas

1 Like

exactly the same thing I’m experiencing

Hi @Justine_Chang,

Can you share a scenario where such behavior is observed.

It helps us to reproduce and investigate the issue.

I shared the steps above, you give it a thinking budget with a json response schema. my config has a thinkingConfig with a budget of 4096. if i ask gemini to extract the parts from an image it will run over the budget until it runs out of tokens. if I remove the json schema then it respects the limit.

I would say the instructions from cor are correct.
It doesn’t happen all the time, maybe 1 in 50 times, but I’m processing thousands and thousands of times, so that’s how it can happen.

I think if you have a script that makes a request to Gemini 2.5 Flash, and you request for structured output.

Loop it 1000 times, you should hit it at least once.

Im experiencing the same issue, but only with the latest gemini-2.5-flash-preview-09-2025 model. The code in my post reproduces it every time. It only happens when you request JSON output.

Hi @all,

Thanks for flagging this issue.

A fix has been rolled-out for this issue early this week.

I just tested using gemini Flash-Lite-Latest model and it’s working fine.. Please check if you are still facing this issue.

I just tested using gemini Flash-Lite-Latest model and it’s working fine.. Please check if you are still facing this issue.

I just tested thinking_budget=0 on gemini-2.5-flash-preview-09-2025 and it continues to output thinking tokens, as discussed here.