1. Rate Limits Page (Correct)
The Google AI Studio Rate Limits page shows I’m using gemini-2.5-flash-lite:
-
Model:
gemini-2.5-flash-lite -
Category: Text-out models
-
RPM: 1 / 4K
-
TPM: 34.08K / 4M
2. Billing Report (Mismatch)
| SKU | Usage | Cost |
|---|---|---|
gemini 2.5 flash lite short input text |
92,797 tokens | $0.29 |
gemini 2.5 flash short output text non-thinking |
123,298 tokens | $1.55 |
The output cost of $1.55 for 123,298 tokens equals ~$12.57/1M tokens, which is much higher than:
-
Expected Flash Lite output: $0.40/1M
-
Even Flash output: $2.50/1M
My Code
from google import genai
client = genai.Client(api_key=api_key)
model_name = "gemini-2.5-flash-lite" # Also tried "models/gemini-2.5-flash-lite"
response = client.models.generate_content(
model=model_name,
contents=contents
)
Environment
-
SDK:
google-genai(latest version) -
Model:
gemini-2.5-flash-lite -
API: Google AI Studio (not Vertex AI)
Questions
-
Why is the output being billed under a different SKU (
gemini 2.5 flash) than the input (gemini 2.5 flash lite)? -
Is there any additional configuration needed to ensure output tokens are also billed at the Flash Lite rate?
-
Is this a known billing system issue?
Thank you for your help!