Does Gemini Live native-audio bill cumulative prompt tokens on every turn? Cost seems to scale with turn count, not call duration.

Surajkumar_Salimath · July 2, 2026, 2:39am

I’m building a real-time voice application on the Gemini Live native-audio (speech-to-speech) model and I’m trying to understand the billing behaviour, because cost does not track call duration the way I expected.

What I’m seeing

Gemini Live returns a cumulative promptTokenCount that grows on every turn within a session. When I reconcile my logged token usage against what I was actually billed, the numbers only line up if the prompt context is billed again on each turn — i.e. cost scales with the number of conversational turns, not with how long the call lasts.

To check this, I re-priced a set of calls three ways from the raw per-turn usageMetadata:

Interpretation	Result vs. actual bill
Price only the final snapshot per call	~8.6× too low
Price the single largest aggregate snapshot	~1.6× too low
Sum every per-turn snapshot	matches within ~3%

Only the “sum every turn” interpretation reproduces the actual charge. That strongly suggests the cumulative context is re-billed each turn.

Why this is a problem

Because cost tracks turn count rather than duration, calls of nearly identical length can cost very different amounts. Two examples from my own data (durations rounded):

A 5.8-minute call cost ~37% less than a 5.6-minute call — the longer one was cheaper.
Two calls of ~7.1 minutes each differed in cost by ~31%.

Sample of the pattern (turns = number of usageMetadata snapshots in the session):

Call	Duration (min)	Turns	Final prompt tokens	Relative cost
A	0.4	2	~8.6k	very low
B	1.8	6	~59k	low
C	5.6	20	~160k	high
D	5.8	22	~222k	medium
E	7.1	29	~261k	high
F	3.3	12	~110k	medium

The turn count and cumulative token growth predict cost far better than duration does.

Questions for the community / team

Can anyone confirm whether Gemini Live native audio bills the cumulative prompt context on every turn? Is that the intended, documented behaviour?
If so, what’s the recommended way to keep per-session cost predictable — e.g. context truncation, session resets, capping turns, or any billing setting I’ve missed?
Is there official documentation that describes exactly how per-turn usageMetadata maps to billed tokens for the Live API?

I’m happy to share more anonymized per-turn usage data if it helps reproduce this. Mainly trying to understand whether this is expected behaviour and how others are managing predictability with the Live audio model.

Sai_Deepika_K · July 7, 2026, 11:23am

Hello @Surajkumar_Salimath,

Gemini Live re-bills the entire cumulative context on every single turn. Because Gemini is natively multimodal, it retains the raw audio tokens from previous turns to preserve tone and nuance rather than converting them to text. As a result, the entire accumulated history is re-processed and charged at the standard audio input rate on every turn, causing costs to scale with turn frequency rather than call duration. For more information please refer to this post.

Topic		Replies	Views
Pricing of Speech to Speech live model Gemini API gemini-api , audio	6	451	June 21, 2026
gemini-3.5-live-translate-preview: usageMetadata from the WebSocket doesn't match actual billing shown in Cloud Console / AI Studio — how is per-turn cost really calculated? Gemini API api	1	34	July 20, 2026
Gemini Live Caching Gemini API audio , context_caching	6	299	March 24, 2026
Could someone help me understand gemini live pricing? Gemini API api , models , billing	1	476	June 23, 2025
How does Gemini Realtime API handle billing for audio input reused in conversation history, and how do cached tokens work in this context? Gemini API api , gemini , live-streaming	0	117	October 6, 2025

Does Gemini Live native-audio bill cumulative prompt tokens on every turn? Cost seems to scale with turn count, not call duration.

What I’m seeing

Why this is a problem

Questions for the community / team

Related topics