Bug: Gemini 2.5 Flash Lite candidatesTokenCount underreports output tokens by ~8.5x vs actual billing

53rdturtle · February 20, 2026, 5:12am

Summary

The usageMetadata.candidatesTokenCount field in Gemini 2.5 Flash Lite API responses reports approximately 1/8.5th of the output tokens that Google Cloud Billing actually charges. This discrepancy does not occur with Gemini 3 Flash Preview — output tokens match exactly for that model using the same code path and logging.

Environment

API: Gemini API (AI Studio, not Vertex AI)
Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent
SKU: 7133-23F2-04B7 — “Generate content output token count gemini 2.5 flash lite short output text non-thinking”

Reproduction

I built an Android app that calls the Gemini API for short text translation and grammar tasks. Every API response’s usageMetadata is logged to a JSONL file with promptTokenCount, candidatesTokenCount, and thoughtsTokenCount.

On 2026-02-19, I compared my analytics logs against Google Cloud Billing for the same day.

Evidence

Request count — matches

Source	Requests
Google AI Studio dashboard	1,170
My analytics log	1,143

~2% difference, confirming the logger captures essentially all requests.

Input tokens — matches

Source	Flash Lite input tokens
Google Cloud Billing	137,944
My analytics log (morning session, ~63% of day)	~135,600

Close match, confirming promptTokenCount is accurate.

Output tokens — 8.5x discrepancy (Flash Lite only)

Source	Flash Lite output tokens
Google Cloud Billing	297,203
My analytics log (morning session, ~63% of day)	~34,800
Ratio	8.5x

Control: Gemini 3 Flash Preview — exact match

On the same day, using the same logging code, I made 1 request to gemini-3-flash-preview:

	Google Cloud Billing	My analytics
Text input tokens	255	255 (logged as part of 1,355 total)
Image input tokens	1,100	1,100 (logged as part of 1,355 total)
Output tokens	859	859

Output tokens match exactly for Gemini 3 Flash Preview. The discrepancy is isolated to Flash Lite.

Cost impact

28-day billing: $12.65 for Flash Lite output tokens (SKU 7133-23F2-04B7)
Expected cost based on candidatesTokenCount: ~$1.49
Overcharge: ~$11.16 over 28 days

Logging code

The candidatesTokenCount is read directly from the API response:

val geminiResponse = json.decodeFromString(GeminiResponse.serializer(), responseBody)
val usage = geminiResponse.usageMetadata

analyticsLogger?.log(AnalyticsEvent(
    timestamp = startTime,
    type = requestType,
    model = model,
    inputTokens = usage?.promptTokenCount ?: 0,
    outputTokens = usage?.candidatesTokenCount ?: 0,
    thinkingTokens = usage?.thoughtsTokenCount ?: 0,
    latencyMs = System.currentTimeMillis() - startTime,
    success = true,
    retries = retries
))

The same code path is used for both Flash Lite and Gemini 3 Flash Preview requests. The only difference is the model URL.

Typical request/response profile

These are short text tasks (game translation):

Translation: ~54 input tokens, ~18 output tokens, ~780ms
Grammar: ~325 input tokens, ~78 output tokens, ~1000ms

The responses are short text — there is no scenario where the model is generating 8.5x more output than reported.

Request

Investigate why candidatesTokenCount for gemini-2.5-flash-lite reports ~1/8.5th of actual billed output tokens
Clarify whether this is a billing error or a reporting error in usageMetadata
If billing is correct, what accounts for the additional output tokens not reflected in candidatesTokenCount?
Issue a billing adjustment if this is confirmed as a metering bug

53rdturtle · February 21, 2026, 4:26am

Switched to Flash 2.0. A little slower than 2.5 Flash Lite with higher output quality. The charging is back to normal, e.g., $0.04 today with similar input/output token amounts ($0.36 when using 2.5 Flash Lite).

Far_Tseng · February 25, 2026, 1:48pm

Confirming the same issue — even worse ratio in my case.

I experienced the exact same billing discrepancy with gemini-2.5-flash-lite-preview-09-2025, and it also affects the stable gemini-2.5-flash-lite. My use case was background memory consolidation tasks (automated, not interactive), so the volume added up quickly before I noticed the billing anomaly.

My numbers (from AI Studio dashboard, 28-day window Jan 29 – Feb 25, 2026):

~5,270 requests (Usage page: Total API Requests)
~42M input tokens (Usage page: Input Tokens per model — Gemini 2.5 Flash Lite)
Billed: $132.87 (Spend page, “All models”)
Expected at Flash Lite rates ($0.10/1M input, $0.40/1M output): ~$5
Overcharge ratio: ~27x

Additional anomaly: On the Spend page, selecting “All models” shows $132.87, but filtering by any individual model — including Gemini 2 Flash Lite — shows “There is no available usage data.” The cost cannot be attributed to any specific model in the dashboard, which makes it impossible to audit from the AI Studio side.

Sampled token counts from Logs: I checked individual request logs (all 519 logged entries are gemini-2.5-flash-lite-preview-09-2025 or gemini-2.5-flash-lite), and each request averages ~6,500–7,200 input tokens with only ~100–560 output tokens. These are short JSON responses — there’s no way the actual output justifies $132.87.

The output tokens appear to be charged at the gemini-2.5-flash rate ($2.50/1M) instead of the gemini-2.5-flash-lite rate ($0.40/1M) — which matches the ~6.25x multiplier and is consistent with the ~8.5x discrepancy you identified on the output token count.

This billing issue appears to affect both the preview and stable variants of Gemini 2.5 Flash Lite.

I’ve since switched away from Flash Lite entirely. Has anyone had success getting a billing adjustment from Google for this?

Mustan_lokhand · February 25, 2026, 10:15pm

Hi

Appreciate the detail information. To get this resolved, I recommend reaching out to our dedicated billing support.

Topic		Replies	Views
Billing discrepancy: detailed token usage and pricing info Gemini API gemini-flash , billing	7	674	July 17, 2025
Billing SKU mismatch: gemini-2.5-flash-lite input charged correctly but output charged as "gemini 2.5 flash" (not lite) Gemini API gemini-api	0	65	January 20, 2026
Incorrect Image Token Calculation Results in Overcharging Gemini API bug	5	359	October 30, 2025
Gemini 2.5 Model Bug Causing Massive Bills, Google Support Unresponsive to Core Issue Gemini API billing , gemini-flash-2-5	10	1384	January 29, 2026
Gemini 3 flash 10-12x higher billing than gemini 3 pro Gemini API billing	2	241	December 19, 2025