Vertex ai Gemini API thinking token exceeds the thinking budget limit

komin · November 27, 2025, 7:47am

I am using Gemini API on vertex ai, gemini 3.0 pro. The problem is that neither thinking budget nor thinking level can’t limit the response from taking up to 2000~3000 tokens thinking. this brings huge problem to degradation of ocr capabilities and budget issue. I hope internal members could fix this problem so that it is more reliable to use the model in businesses. Thanks.
(for context I’m using image input and explicit context caching)

Topic		Replies	Views
Latest @google/genai with 2.5 flash ignoring thinking budget Gemini API generative-ai , gemini-flash	11	710	December 2, 2025
Gemini 2.5 Flash Overthinking by a lot Gemini API prompt , gemini-2	6	683	September 5, 2025
Big Problem! Gemini 3.0 pro preview thought token exceeding problem Gemini API bug , api , gemini , thinking	8	810	March 29, 2026
Gemini 3 Pro is using too many tokens Gemini API bug , gemini	1	548	December 23, 2025
Thinking ate all the tokens and hit MAX_TOKENS Gemini API bug , api	1	302	December 9, 2025

Vertex ai Gemini API thinking token exceeds the thinking budget limit

Related topics