"Low" Reasoning Instability & Output Budget Cannibalization (Gemini 3.0 Pro)

LCH_ChatDev · November 20, 2025, 3:39am

Hello, Google AI Team.

I am developing a [Character Roleplay Chat Service] using Gemini 3.0 Pro.

First, I want to share my highly positive impression of the model’s core performance.
The creativity demonstrated in chat interactions within my target language region is outstanding. In particular, the “interactions between multiple NPC characters” and the “utilization of background context” have improved significantly. Even with the “Low” reasoning setting, the narrative richness and quality far exceed what we achieved with high reasoning tokens in Gemini 2.5 Pro.

However, I am facing a critical structural issue regarding the “Low” reasoning setting.

I operate a dual-tier service using both “High” and “Low” settings.

High Setting (Premium): We anticipate 2,000+ reasoning tokens. We allocate a large max_output_tokens budget and charge a premium price. This works perfectly, and users are satisfied.
Low Setting (Standard): This is intended to be our affordable baseline option, replacing Gemini 2.5 Pro. We priced this tier assuming a reasoning budget of 300~500 tokens, with a slight price increase that is acceptable to users.

The Fatal Problem with “Low” Setting:
Since we cannot completely disable reasoning in 3.0 Pro, “Low” is our only option for the standard tier. However, even with strict system prompts, the “Low” setting often behaves unpredictably, consuming 1,000+ reasoning tokens.

Because Reasoning Tokens and Final Output Tokens share the same max_output_tokens limit:

Cannibalization: The inflated reasoning process eats into the budget meant for the final response.
Truncation: Consequently, the actual user-facing response gets cut off mid-sentence.
Economic Deadlock: We cannot simply increase the max_output_tokens to prevent truncation, as that would drive the costs beyond what is viable for a “Standard” price tier.

Feature Request / Immediate Fix Needed:
To make the “Low” setting commercially viable as a standard option, we need one of the following:

Hard Limit for Reasoning Tokens: Allow us to set a specific integer limit (e.g., max_reasoning_tokens: 300) separate from the total output. If the limit is hit, the model must stop reasoning and generate the response.
Budget Separation: Distinct parameters for max_reasoning_tokens and max_response_tokens.
“Ultra-Low” Mode: A mode strictly tuned for minimal reasoning (scratchpad level, <200 tokens) for low-latency, standard-tier applications.

Without this control, we cannot migrate our standard user base to Gemini 3.0 Pro.

Configuration:

Model: Gemini 3.0 Pro
Reasoning Setting: Low
Current behavior: “Low” reasoning often exceeds expectations (1000+ tokens), causing response truncation within the standard cost budget.

Thank you.

Pritam_Dey · November 21, 2025, 1:00pm

I’ve encountered the same issue. Even when requesting structured output, the number of reasoning tokens varies significantly. For relatively simple tasks, it ranges anywhere from about 100 to 300 tokens. This level of fluctuation makes it difficult to define a consistent token budget for our application.

Abhijit_Pramanik · December 30, 2025, 10:41pm

Hello,

Thank you for this incredibly detailed breakdown.

You could try adding a negative constraint in the system instruction (e.g., “Reasoning depth must not exceed 2 steps” or “Output response immediately with minimal internal monologue”) can sometimes dampen the “Low” setting’s variance, though it is not 100% reliable.

Since reasoning often scales with input complexity, aggressively summarizing the input context for the “Standard” tier can sometimes force the model to think less, preserving the output budget.

Topic		Replies	Views
Big Problem! Gemini 3.0 pro preview thought token exceeding problem Gemini API bug , api , gemini , thinking	7	512	December 26, 2025
Gemini 3 Preview (OpenAI-compatible) rejects reasoning_effort: "medium" — works with low and high Gemini API api , gemini , openai_compatibility	4	212	December 29, 2025
Please give us the option to disable thinking for Gemini 3 Gemini API feedback	3	471	November 20, 2025
Thinking ate all the tokens and hit MAX_TOKENS Gemini API bug , api	1	142	December 9, 2025
New Model Levels (Fast/Thinking/Pro) Continue to Be a Problem for Long Term Projects Google AI Studio ai-studio , feedback , gemini	9	3339	January 8, 2026

"Low" Reasoning Instability & Output Budget Cannibalization (Gemini 3.0 Pro)

Related topics