For a couple of days now, the Gemini Live API has been reporting a prompt token consumption that’s three times higher than the actual count. I’m having the same problem with both the gemini-live-2.5-flash-preview and gemini-2.5-flash-native-audio-preview-09-2025 models.
The initial prompt tokens count is the sum of a system prompt, the grounding documentation that I pass as a media file, a tool, which is a function defined only with a name and description, and an initial text user prompt, which is a greeting, such as Good morning, that I pass to the model to give the user the illusion that the model is ‘picking up the phone, saying hello, and introducing itself.’
The grounding documentation, i.e.. the media file, is a Markdown file, so it’s a text type file.
If with the same payload, so system prompt + media file + tool + greeting user prompt, I query the token counting endpoint https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:countTokens, the total number of prompt tokens is 8431.
If I use the content of the media file as the user prompt (+ greeting) instead of passing it as ‘media’, the token count increases by 1: 8432 (which is consistent with the previous result).
Instead, using the Gemini Live API, with the same system prompt + media file + tool + greeting user prompt, triple the actual number of tokens is counted immediately—that is, before I even start speaking:
- 23,842 tokens with the
gemini-live-2.5-flash-previewmodel - 24,293 tokens with the
gemini-2.5-flash-native-audio-preview-09-2025model
At the very least, I would expect both models to report the same number of tokens, even if it’s incorrect. But they don’t even agree with each other, and the difference of about 450 tokens is significant.
I noticed this because the total token consumption for each individual test was usually around 30,000-40,000 tokens. But yesterday, in one session, it skyrocketed to 260,000. Even today, it’s never less than 130,000/140,000.
A test session usually consists of 4 or 5 turns, so as a maximum count of prompt tokens I would expect 8431 * 5 = about 40000/45000, not greater than 100000!
I ran some tests by printing the usage metadata and found that the prompt tokens are the culprits.
I also noticed that the token count for the first response sentence the model must say at the start of the session, which, as per the system prompt instructions, is always the same and is “Buongiorno, Grand Hotel Apòsa, sono Anna. Come posso aiutarla?” (from italian: “Good morning, Grand Hotel Apòsa, this is Anna. How can I help you?”), is counted differently:
- 18 tokens with the
gemini-live-2.5-flash-previewmodel - 104 tokens with the
gemini-2.5-flash-native-audio-preview-09-2025model
But this is a less serious problem ![]()
For completeness, the language of the application is Italian.
Can you check if there are any problems with the token count for the Gemini Live API?
Thank you for your cooperation.