Summary
The usageMetadata.candidatesTokenCount field in Gemini 2.5 Flash Lite API responses reports approximately 1/8.5th of the output tokens that Google Cloud Billing actually charges. This discrepancy does not occur with Gemini 3 Flash Preview — output tokens match exactly for that model using the same code path and logging.
Environment
- API: Gemini API (AI Studio, not Vertex AI)
- Endpoint:
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent - SKU:
7133-23F2-04B7— “Generate content output token count gemini 2.5 flash lite short output text non-thinking”
Reproduction
I built an Android app that calls the Gemini API for short text translation and grammar tasks. Every API response’s usageMetadata is logged to a JSONL file with promptTokenCount, candidatesTokenCount, and thoughtsTokenCount.
On 2026-02-19, I compared my analytics logs against Google Cloud Billing for the same day.
Evidence
Request count — matches
| Source | Requests |
|---|---|
| Google AI Studio dashboard | 1,170 |
| My analytics log | 1,143 |
~2% difference, confirming the logger captures essentially all requests.
Input tokens — matches
| Source | Flash Lite input tokens |
|---|---|
| Google Cloud Billing | 137,944 |
| My analytics log (morning session, ~63% of day) | ~135,600 |
Close match, confirming promptTokenCount is accurate.
Output tokens — 8.5x discrepancy (Flash Lite only)
| Source | Flash Lite output tokens |
|---|---|
| Google Cloud Billing | 297,203 |
| My analytics log (morning session, ~63% of day) | ~34,800 |
| Ratio | 8.5x |
Control: Gemini 3 Flash Preview — exact match
On the same day, using the same logging code, I made 1 request to gemini-3-flash-preview:
| Google Cloud Billing | My analytics | |
|---|---|---|
| Text input tokens | 255 | 255 (logged as part of 1,355 total) |
| Image input tokens | 1,100 | 1,100 (logged as part of 1,355 total) |
| Output tokens | 859 | 859 |
Output tokens match exactly for Gemini 3 Flash Preview. The discrepancy is isolated to Flash Lite.
Cost impact
- 28-day billing: $12.65 for Flash Lite output tokens (SKU 7133-23F2-04B7)
- Expected cost based on
candidatesTokenCount: ~$1.49 - Overcharge: ~$11.16 over 28 days
Logging code
The candidatesTokenCount is read directly from the API response:
val geminiResponse = json.decodeFromString(GeminiResponse.serializer(), responseBody)
val usage = geminiResponse.usageMetadata
analyticsLogger?.log(AnalyticsEvent(
timestamp = startTime,
type = requestType,
model = model,
inputTokens = usage?.promptTokenCount ?: 0,
outputTokens = usage?.candidatesTokenCount ?: 0,
thinkingTokens = usage?.thoughtsTokenCount ?: 0,
latencyMs = System.currentTimeMillis() - startTime,
success = true,
retries = retries
))
The same code path is used for both Flash Lite and Gemini 3 Flash Preview requests. The only difference is the model URL.
Typical request/response profile
These are short text tasks (game translation):
- Translation: ~54 input tokens, ~18 output tokens, ~780ms
- Grammar: ~325 input tokens, ~78 output tokens, ~1000ms
The responses are short text — there is no scenario where the model is generating 8.5x more output than reported.
Request
- Investigate why
candidatesTokenCountfor gemini-2.5-flash-lite reports ~1/8.5th of actual billed output tokens - Clarify whether this is a billing error or a reporting error in
usageMetadata - If billing is correct, what accounts for the additional output tokens not reflected in
candidatesTokenCount? - Issue a billing adjustment if this is confirmed as a metering bug