Gemini 3.5 Flash is actively penalizing developers who write good, efficient prompts

Gemini 3.5 Flash is actively penalizing developers who write good, efficient prompts.

Here is the brutal reality of how the costs break down when compared side-by-side.

The Economic Breakdown

To understand how severely Google has inflated the costs, look at the baseline API pricing and the total cost to run a standard intelligence benchmark:

Model Input Price (per 1M tokens) Output Price (per 1M tokens) Total Benchmark Cost
Gemini 3.5 Flash $1.50 $9.00 $1,552
Gemini 3.1 Pro $1.00 $6.00 $892
Gemini 3.1 Flash-Lite $0.25 $1.50 ~$94

You are reading that correctly. It costs roughly 75% more to run a task on 3.5 Flash than it does on 3.1 Pro, and an astronomical 16x more than 3.1 Flash-Lite. Furthermore, the baseline intelligence score of 3.5 Flash (55) actually tested lower than 3.1 Pro (57), making the price hike even more baffling for standard coding workflows.

Why is 3.5 Flash Burning So Many Tokens?

The massive spike in your token usage isn’t in your head. It comes down to a fundamental shift in how the 3.5 architecture operates beneath the surface:

  • Forced “Thinking” Tokens: Gemini 3.5 Flash generates internal “thinking” tokens to reason through problems before outputting the final response. While this raises its score for complex agentic workflows, you are footed the bill for all of that invisible output text at $9.00 per 1M tokens.

  • The Multi-Agent Loop: Because it is deeply optimized for Antigravity’s “agent-first” architecture, the model tends to simulate multiple turns of internal tool-calling and validation even for straightforward requests.

  • Input Token Bloat: All of those hidden iterative turns drastically inflate the input context window on every pass. Your previously efficient, single-shot prompt is now being dragged through a heavy, multi-step orchestration loop whether it needs it or not.

The Penalty for Good Prompting

Google built 3.5 Flash for developers who want to throw vague, zero-shot instructions at an agent (“build me a website”) and let the model figure out the rest.

But for developers like you, who already know how to craft precise, well-structured prompts that get right to the point, this model is a trap. You are being forced to pay frontier-model prices for an autonomous reasoning loop you don’t actually need. Until Google allows tools like Antigravity to step down to the highly efficient 3.1 Flash or 3.1 Flash-Lite, relying on the raw API in your own IDE remains the only way to avoid burning cash.