Critical Feedback: Mandatory "Thinking" in Gemini 3 Flash is a regression in UX and cost-efficiency

I am writing to express my strong frustration regarding the “Thinking” (Reasoning) feature in the new Gemini 3 Flash model.

In previous versions (like Gemini 2.x), we had the option to completely disable the thinking budget. However, in Gemini 3, this control has been taken away. Even with the “Minimal” option selected, the model frequently triggers reasoning processes that I neither requested nor needed.

As a developer building a custom chatbot, I’ve identified several critical issues in the API response structure that feel like a “token tax” rather than a feature:

1. Messy API Response Structure (Parsing Issues):
Looking at the API logs, the parts array is now split into multiple text blocks. For example:

  • parts[0]: “thought\n”
  • parts[1]: [Reasoning/Thinking content]
  • parts[2]: [Actual Bot Output]

Existing implementations that expect a single text block now fail or display “thought” to the end-user. This forces developers to rewrite their parsing logic just to filter out junk data we never asked for.

2. Forced Token Inflation (Financial Burden):
Even for simple instructions (like following a specific persona or honorifics), the model generates unnecessary reasoning tokens. These tokens are billed, and worse, they must be included in the context for subsequent turns. This leads to a compounding increase in costs for data that adds zero value to the final output. It feels less like a technical necessity and more like a tactic to increase token billing.

3. Latency Issues:
Flash models are chosen for their speed. Forcing a “Thinking” phase—even a minimal one—destroys the low-latency advantage of the Flash tier.

4. Request for Action:
“Minimal” is not enough because it is unpredictable. We need a “Disabled” (0 budget) option back. If a user wants a direct, fast, and cost-effective response, the model should not be forced to “think” and waste tokens.

I strongly urge the Google AI team to restore the toggle to completely disable Thinking before the final production release. Please stop forcing unwanted reasoning tokens on developers who prioritize efficiency and control.