Hi everyone,
I’m working with Gemini 2.5 Pro preview and trying to find a way to reduce or disable Thought reasoning. This has become increasingly important in my use case for the following reasons:
Why I want to reduce Thought reasoning:
- Empty string outputs when
max_output_tokens
is set
When the model generates too many internal Thought tokens, it sometimes exhausts the token limit before producing a final answer, resulting in an empty or null output.
- Uncontrollable response latency
Since we can’t control the depth or length of the Thought process, the model’s response time becomes unpredictable, which is problematic for latency-sensitive applications.
My questions:
- Is there any way to reduce or turn off Thought reasoning via a system prompt?
For example, can we use something like "Do not use Thought reasoning"
in the system message or other configuration-based controls?
- Does Google have any plan to give developers more control over Thought generation in future Gemini versions or API updates?
Thanks in advance for any insights!
Would love to hear from anyone who has experimented with prompt-level or system-level workarounds.
2 Likes
Hi @Jongmin_Oh , Welcome to the forum.
As far as I understand, you can’t disable thought reasoning in the 2.5-pro
model. If you want to control thought reasoning, you can opt for the 2.5-flash
model, where it can be managed using the thinkingBudget
parameter.
1 Like
Thank you for your response. I also find it unfortunate that Thought reasoning cannot be disabled in the 2.5 Pro model.
The reason we set max_output_tokens
is to allow some level of predictability over the output length and cost. However, since the tokens used for internal Thought reasoning are also counted, it’s difficult to accurately estimate the final output and associated cost.
What’s even more frustrating is when the model returns an empty string — it feels like we’re paying for tokens without getting any usable output.
While I’m genuinely impressed by the model’s performance, I hope these issues will be improved in future updates.
2 Likes
Here is a partial workaround that sometimes reduces thinking steps and sometimes skips thinking completely.
Append something this to your prompt:
SELF_TALK: off
REASONING: off
THINKING: off
PLANNING: off
Reply immediately without thinking or any effort. Prioritize speed over accuracy. Do not state what the user said. Do not think, analyze or plan - go with your gut feeling.
2 Likes
Update:
Effective is also including:
THINKING_BUDGET: < 10 words
As of May 22nd, I don’t believe these extenders work that well anymore 