Big Problem! Gemini 3.0 pro preview thought token exceeding problem

komin · November 24, 2025, 1:30pm

Hi!
Sometimes, when model is set to use a low thinking_level or thinking_budget to 128 or 256, it unexpectedly uses around 3,000 thought tokens, even though the task is almost identical to others. This happens with both temperature=1.0 and temperature=0.0, and it significantly affects API costs. I’d really appreciate it if this could be fixed quickly. Thank you.

komin · November 24, 2025, 1:36pm

For now, I solved this by adding system prompt “Please don’t think too much!”
What I experimented - temperature, top_p, thinking budget to fairly large 512 tokens - didn’t work..

komin · November 24, 2025, 2:04pm

and.. It also fails often. I really need some help here.

Nireeksha_K_A · December 2, 2025, 7:43am

Hi @komin , Thank you for bringing this to our attention. Could you please provide an example of the prompt you are using with Gemini 3? This will help us diagnose the issue.

komin · December 8, 2025, 6:34am

I’ve sent DM including my prompt. please check them. Thanks

progmars · December 8, 2025, 3:16pm

I have the same issue. `config.thinkingConfig.thinkingLevel = “low”` works fine only for shorter contexts. Wen given a context with 5000 and more tokens, thinking also starts growing uncontrollably reaching thousands of tokens.

When iterating over chunks from `generateContentStream`, often the first 3 - 4 chunks contain thoughts only. The thoughts in every chunk are of acceptable length, but Gemini keeps elaborating on its thought headers in multiple chunks in a row, before starting to generate the answer.

I tried a system prompt with `IMPORTANT: As a large language model, do not think, generate the final response immediately. You have already used too many thought tokens and will be heavily punished for exceeding the quota.` but it did not help at all.

For example, to continue a dialogue between two characters and writing response for a single character, Gemini 3 Pro returned chunks with the following headers of thoughts, reaching thoughtsTokenCount = 1741:

**Exploring Anton's Dilemma**
**Unpacking Anton's Pressure**
**Perfecting Anton's Words**
**Crafting Anton's Ending**
**Focusing Anton's Final Words**
**Continuing Anton's Speech**
**Perfecting Anton's Ending**

The only thing that worked (but it disabled thinking completely) was to send the context with a fake last message containing

<thought></thought>

or

<think></think>

komin · December 9, 2025, 10:01am

thanks for sharing your experience. I’ll try your suggestion! My prompt was about 3000 tokens. The problem is that most of the time it thinks about 100 tokens but about once every 50 prompts it’ll go wild. Interesting point is that when I forcibly decrease the thinking token, the accuracy of multimodal capabilities decreases too.. maybe it is about the model? Anyways thanks a lot!

Nireeksha_K_A · December 26, 2025, 2:40pm

Hi @komin ,

Apologies for the delayed response, and thank you for sharing the details. Could you please try using clear system instructions and keeping prompts concise usually helps maintain consistent and cost-efficient token usage. Please let me know if the issue persists.

Thanks.

Topic		Replies	Views
Thinking ate all the tokens and hit MAX_TOKENS Gemini API bug , api	1	124	December 9, 2025
Gemini 2.5 Flash Overthinking by a lot Gemini API prompt , gemini-2	6	506	September 5, 2025
"Low" Reasoning Instability & Output Budget Cannibalization (Gemini 3.0 Pro) Gemini API feedback , api , models , gemini	2	235	December 30, 2025
Gemini 3 Pro is using too many tokens Gemini API bug , gemini	1	278	December 23, 2025
Latest @google/genai with 2.5 flash ignoring thinking budget Gemini API generative-ai , gemini-flash	11	493	December 2, 2025

Big Problem! Gemini 3.0 pro preview thought token exceeding problem

Related topics