Gemini-3-flash-preview: billing ~50× higher than expected — hidden thinking tokens not reported by deprecated SDK

Single multimodal call (JPEG + short prompt), google.generativeai v0.8.6, zero config:

prompt_token_count     = 1,107
candidates_token_count = 1,193
total_token_count      = 4,598
→ hidden gap           = 2,298 tokens (not exposed in SDK)

With longer prompts, the gap grows to 5,800–15,000 tokens/call. One call produced zero visible output but 6,109 hidden tokens.

Billing check: balance before = $8.77, after one call = ~$8.82. Delta ~$0.05 for a call that should cost ~$0.001 based on visible tokens.

We believe the gap is thinking tokens that the deprecated SDK doesn’t expose (thoughts_token_count attribute missing). thinking_budget=0 has no effect. max_output_tokens is a shared budget (thinking + output) — setting it to 2048 gives ~80 tokens of actual content.

For our use case (OCR/transcription), thinking is not just unnecessary — it actively degrades quality. With thinking enabled, the model produces LaTeX-wrapped characters, inserts spaces between letters, and hallucinates content that isn’t in the source image. We measured +18 points of character error rate with thinking vs without. We need a way to fully disable it.

gemini-2.0-flash and gemini-2.0-flash-lite both return 404 on our account, so we cannot fall back to a model without thinking.

Related: cost explosion thread

Reproduction:

python

import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-3-flash-preview")
img = open("image.jpeg", "rb").read()
r = model.generate_content([{"mime_type": "image/jpeg", "data": img}, "Transcribe this."])
m = r.usage_metadata
print(m.total_token_count - m.prompt_token_count - m.candidates_token_count)
# → ~2,300 hidden tokens, billing ~$0.05 vs expected ~$0.001

Questions:

  1. Does the total_token_count gap represent thinking tokens?

  2. Is there a way to disable thinking on gemini-3-flash-preview?

  3. What rate applies to these hidden tokens?

Python 3.12, Ubuntu 24.04, JPEG 1086×1541 px.

Hi

Would it be possible to DM me your project number ?

Hi Mustansir,

Thank you for looking into this.

Happy to provide any additional information (logs, reproduction scripts, SDK versions) if that helps.

Best regards,
Geoffroy van der Straeten
Paleion