A question regarding Pricing of Live API and Gemini 2.5 Flash Native Audio Dialogue:
The pricing documentation indicates a charge per second of audio processed. As far as I’m aware of, currently it charges 32 audio tokens / second. Although other sources also claim 25 audio tokens / second. For the cost analysis, I need to clarify how this is calculated for a constantly open audio stream.
My specific questions are:
- Are periods of silence within the continuous audio stream billed at the same rate as periods when a user is actively speaking?
- Does the Live API charge 32 tokens or 25 tokens per second with the new Gemini 2.5 flash native audio dialogue model?
1 Like
That’s a really interesting question! I don’t know the answer, sorry - but normal API requests return summary token usage data, so assuming the Live API does you should be to quickly test this?
I would assume you’re charged per second, regardless of what that second contains, because it still has to process it to work out that it’s silent. Just a guess though - would love official confirmation on this too.
1 Like
I just got a response from Google Cloud Billing:
Thanks for reaching out with your questions about Gemini 2.5 Flash Native Audio Dialogue pricing, especially for continuous audio streams. I’m happy to clarify!
Billing for Silence in Continuous Audio Streams
For a constantly open audio stream, periods of silence are indeed billed at the same rate as periods when a user is actively speaking.
Here’s why: The service is continuously processing the audio input to detect speech, discern background noise, and determine when a response is appropriate. Even when there’s no active speech, the system is still consuming resources to listen and process the stream. Therefore, all processed audio, including silence, contributes to the billed duration.
Audio Token Conversion Rate
Regarding the audio token conversion rate, the most current and accurate information from our documentation states that audio is converted to tokens at a fixed rate of 32 tokens per second.
While you might have come across other figures, please use 32 audio tokens per second for your cost analysis.
I hope this clears things up for your cost analysis! Let me know if you have any other questions.
So silence actually costs money 
1 Like