Gemini-3-flash-preview: billing ~50× higher than expected — hidden thinking tokens not reported by deprecated SDK

Paleion · March 25, 2026, 8:06am

Single multimodal call (JPEG + short prompt), google.generativeai v0.8.6, zero config:

prompt_token_count     = 1,107
candidates_token_count = 1,193
total_token_count      = 4,598
→ hidden gap           = 2,298 tokens (not exposed in SDK)

With longer prompts, the gap grows to 5,800–15,000 tokens/call. One call produced zero visible output but 6,109 hidden tokens.

Billing check: balance before = $8.77, after one call = ~$8.82. Delta ~$0.05 for a call that should cost ~$0.001 based on visible tokens.

We believe the gap is thinking tokens that the deprecated SDK doesn’t expose (thoughts_token_count attribute missing). thinking_budget=0 has no effect. max_output_tokens is a shared budget (thinking + output) — setting it to 2048 gives ~80 tokens of actual content.

For our use case (OCR/transcription), thinking is not just unnecessary — it actively degrades quality. With thinking enabled, the model produces LaTeX-wrapped characters, inserts spaces between letters, and hallucinates content that isn’t in the source image. We measured +18 points of character error rate with thinking vs without. We need a way to fully disable it.

gemini-2.0-flash and gemini-2.0-flash-lite both return 404 on our account, so we cannot fall back to a model without thinking.

Related: cost explosion thread

Reproduction:

python

import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-3-flash-preview")
img = open("image.jpeg", "rb").read()
r = model.generate_content([{"mime_type": "image/jpeg", "data": img}, "Transcribe this."])
m = r.usage_metadata
print(m.total_token_count - m.prompt_token_count - m.candidates_token_count)
# → ~2,300 hidden tokens, billing ~$0.05 vs expected ~$0.001

Questions:

Does the total_token_count gap represent thinking tokens?
Is there a way to disable thinking on gemini-3-flash-preview?
What rate applies to these hidden tokens?

Python 3.12, Ubuntu 24.04, JPEG 1086×1541 px.

Mustan_lokhand · March 25, 2026, 4:42pm

Hi

Would it be possible to DM me your project number ?

Paleion · March 26, 2026, 5:38am

Hi Mustansir,

Thank you for looking into this.

Happy to provide any additional information (logs, reproduction scripts, SDK versions) if that helps.

Best regards,
Geoffroy van der Straeten
Paleion

Topic		Replies	Views
thinkingLevel: 'low' producing unpredictable 60k+ thought-token spikes on trivial prompts - systemic billing impact Gemini API billing , thinking , ground-search , gemini-3	0	69	June 4, 2026
Bug Report: google.genai SDK produces significantly degraded multimodal output vs deprecated google.generativeai on Gemini 3 Flash Preview Gemini API bug , api	2	112	March 26, 2026
Gemini-2.5-flash-preview-09-2025 breaks the thinking_budget parameter Gemini API bug , gemini-flash-2-5	3	529	October 21, 2025
Thinking Tokens Counted, but Billed as Non-Thinking Gemini API api , billing	1	434	April 24, 2025
Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0 Gemini API help_request	5	1760	April 22, 2025

Gemini-3-flash-preview: billing ~50× higher than expected — hidden thinking tokens not reported by deprecated SDK

Related topics