Bug: Gemini 2.5 Flash Lite candidatesTokenCount underreports output tokens by ~8.5x vs actual billing

Summary

The usageMetadata.candidatesTokenCount field in Gemini 2.5 Flash Lite API responses reports approximately 1/8.5th of the output tokens that Google Cloud Billing actually charges. This discrepancy does not occur with Gemini 3 Flash Preview — output tokens match exactly for that model using the same code path and logging.

Environment

  • API: Gemini API (AI Studio, not Vertex AI)
  • Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent
  • SKU: 7133-23F2-04B7 — “Generate content output token count gemini 2.5 flash lite short output text non-thinking”

Reproduction

I built an Android app that calls the Gemini API for short text translation and grammar tasks. Every API response’s usageMetadata is logged to a JSONL file with promptTokenCount, candidatesTokenCount, and thoughtsTokenCount.

On 2026-02-19, I compared my analytics logs against Google Cloud Billing for the same day.

Evidence

Request count — matches

Source Requests
Google AI Studio dashboard 1,170
My analytics log 1,143

~2% difference, confirming the logger captures essentially all requests.

Input tokens — matches

Source Flash Lite input tokens
Google Cloud Billing 137,944
My analytics log (morning session, ~63% of day) ~135,600

Close match, confirming promptTokenCount is accurate.

Output tokens — 8.5x discrepancy (Flash Lite only)

Source Flash Lite output tokens
Google Cloud Billing 297,203
My analytics log (morning session, ~63% of day) ~34,800
Ratio 8.5x

Control: Gemini 3 Flash Preview — exact match

On the same day, using the same logging code, I made 1 request to gemini-3-flash-preview:

Google Cloud Billing My analytics
Text input tokens 255 255 (logged as part of 1,355 total)
Image input tokens 1,100 1,100 (logged as part of 1,355 total)
Output tokens 859 859

Output tokens match exactly for Gemini 3 Flash Preview. The discrepancy is isolated to Flash Lite.

Cost impact

  • 28-day billing: $12.65 for Flash Lite output tokens (SKU 7133-23F2-04B7)
  • Expected cost based on candidatesTokenCount: ~$1.49
  • Overcharge: ~$11.16 over 28 days

Logging code

The candidatesTokenCount is read directly from the API response:

val geminiResponse = json.decodeFromString(GeminiResponse.serializer(), responseBody)
val usage = geminiResponse.usageMetadata

analyticsLogger?.log(AnalyticsEvent(
    timestamp = startTime,
    type = requestType,
    model = model,
    inputTokens = usage?.promptTokenCount ?: 0,
    outputTokens = usage?.candidatesTokenCount ?: 0,
    thinkingTokens = usage?.thoughtsTokenCount ?: 0,
    latencyMs = System.currentTimeMillis() - startTime,
    success = true,
    retries = retries
))

The same code path is used for both Flash Lite and Gemini 3 Flash Preview requests. The only difference is the model URL.

Typical request/response profile

These are short text tasks (game translation):

  • Translation: ~54 input tokens, ~18 output tokens, ~780ms
  • Grammar: ~325 input tokens, ~78 output tokens, ~1000ms

The responses are short text — there is no scenario where the model is generating 8.5x more output than reported.

Request

  1. Investigate why candidatesTokenCount for gemini-2.5-flash-lite reports ~1/8.5th of actual billed output tokens
  2. Clarify whether this is a billing error or a reporting error in usageMetadata
  3. If billing is correct, what accounts for the additional output tokens not reflected in candidatesTokenCount?
  4. Issue a billing adjustment if this is confirmed as a metering bug

Switched to Flash 2.0. A little slower than 2.5 Flash Lite with higher output quality. The charging is back to normal, e.g., $0.04 today with similar input/output token amounts ($0.36 when using 2.5 Flash Lite).

Confirming the same issue — even worse ratio in my case.

I experienced the exact same billing discrepancy with gemini-2.5-flash-lite-preview-09-2025, and it also affects the stable gemini-2.5-flash-lite. My use case was background memory consolidation tasks (automated, not interactive), so the volume added up quickly before I noticed the billing anomaly.

My numbers (from AI Studio dashboard, 28-day window Jan 29 – Feb 25, 2026):

  • ~5,270 requests (Usage page: Total API Requests)
  • ~42M input tokens (Usage page: Input Tokens per model — Gemini 2.5 Flash Lite)
  • Billed: $132.87 (Spend page, “All models”)
  • Expected at Flash Lite rates ($0.10/1M input, $0.40/1M output): ~$5
  • Overcharge ratio: ~27x

Additional anomaly: On the Spend page, selecting “All models” shows $132.87, but filtering by any individual model — including Gemini 2 Flash Lite — shows “There is no available usage data.” The cost cannot be attributed to any specific model in the dashboard, which makes it impossible to audit from the AI Studio side.

Sampled token counts from Logs: I checked individual request logs (all 519 logged entries are gemini-2.5-flash-lite-preview-09-2025 or gemini-2.5-flash-lite), and each request averages ~6,500–7,200 input tokens with only ~100–560 output tokens. These are short JSON responses — there’s no way the actual output justifies $132.87.

The output tokens appear to be charged at the gemini-2.5-flash rate ($2.50/1M) instead of the gemini-2.5-flash-lite rate ($0.40/1M) — which matches the ~6.25x multiplier and is consistent with the ~8.5x discrepancy you identified on the output token count.

This billing issue appears to affect both the preview and stable variants of Gemini 2.5 Flash Lite.

I’ve since switched away from Flash Lite entirely. Has anyone had success getting a billing adjustment from Google for this?

Hi

Appreciate the detail information. To get this resolved, I recommend reaching out to our dedicated billing support.