Hi everyone,
I’m working with Gemini 2.5 Pro preview and trying to find a way to reduce or disable Thought reasoning. This has become increasingly important in my use case for the following reasons:
Why I want to reduce Thought reasoning:
- Empty string outputs when
max_output_tokens
is set
When the model generates too many internal Thought tokens, it sometimes exhausts the token limit before producing a final answer, resulting in an empty or null output. - Uncontrollable response latency
Since we can’t control the depth or length of the Thought process, the model’s response time becomes unpredictable, which is problematic for latency-sensitive applications.
My questions:
- Is there any way to reduce or turn off Thought reasoning via a system prompt?
For example, can we use something like"Do not use Thought reasoning"
in the system message or other configuration-based controls? - Does Google have any plan to give developers more control over Thought generation in future Gemini versions or API updates?
Thanks in advance for any insights!
Would love to hear from anyone who has experimented with prompt-level or system-level workarounds.