I am asking gemini 2.5 flash with the latest genai js sdk to extract information from a png image, I call the generateContent command and pass a config like so
config: {
systemInstruction: `look at the picture and extract the parts`,
temperature: 0,
thinkingConfig: {
thinkingBudget: 4096,
},
maxOutputTokens: 8096
}
I also pass a json schema as well. The problem is the model seems to be ignoring the thinking budget and having ârunaway thoughtsâ if i increase the max tokens it will think until it hits the max tokens.
EDIT:
I just removed the json schema from the call and it seems to respect the thinking budget now. so it appears that is the problem. Also if i reduce the thinkingbudget to like 1024 then it seems to respect it more often.
I shared the steps above, you give it a thinking budget with a json response schema. my config has a thinkingConfig with a budget of 4096. if i ask gemini to extract the parts from an image it will run over the budget until it runs out of tokens. if I remove the json schema then it respects the limit.
I would say the instructions from cor are correct.
It doesnât happen all the time, maybe 1 in 50 times, but Iâm processing thousands and thousands of times, so thatâs how it can happen.
I think if you have a script that makes a request to Gemini 2.5 Flash, and you request for structured output.
Loop it 1000 times, you should hit it at least once.
Im experiencing the same issue, but only with the latest gemini-2.5-flash-preview-09-2025 model. The code in my post reproduces it every time. It only happens when you request JSON output.