Audio Token Counts Unexpectedly Low in Gemini Live API

JamesIntallaga · October 22, 2025, 1:58am

## Issue

Using `gemini-2.0-flash-live-001` via LiveKit. 2-minute voice conversation shows:

- Audio input: **3 tokens** (seems too low)

- Audio output: **0 tokens** (agent is speaking!)

- Text tokens: 13,521 input(normal,system prompt), 74 output (I set audio output, should have been zero here.)

## Questions

1. Is 3 audio input 3 chunks of audio rather?

2. Why 0 audio output tokens when audio is playing?

3. Do text output tokens (74) represent audio output actually, but still too small.

4. What’s expected for a 2-min voice conversation?

Need to understand this for accurate user billing.

## here is my Code snippet:

```python

# Model setup

google.beta.realtime.RealtimeModel(

model=“gemini-2.0-flash-live-001”,

voice=“Leda”,

input_audio_transcription=AudioTranscriptionConfig(),

output_audio_transcription=AudioTranscriptionConfig(),

)

# Metrics

@session.on(“metrics_collected”)

def _on_metrics_collected(ev):

inp = ev.metrics.input_token_details

out = ev.metrics.output_token_details

# inp.audio_tokens = 3, out.audio_tokens = 0

```I really appreciate whoever can look into this and clarify things up. I have been troubled for quite a while and seeking answers around in vain.

Pannaga_J · December 2, 2025, 1:17pm

Hi @JamesIntallaga Apologies for late response
Thanks for the heads-up. This error could be due to faulty LiveKit integration with gemini-2.0-flash-live-001model. Since this model is being deprecated on December 9th, please confirm if this same issue is present in any other Live Model**.**

Sanjay_Shreeyans_Jav · December 3, 2025, 3:31am

When using the Gemini Live API (gemini-2.5-flash-native-audio-preview-09-2025 model) directly via WebSocket (not LiveKit), the usageMetadata.responseTokenCount values appear significantly lower than expected for audio output.

Environment:

Model: models/gemini-2.5-flash-native-audio-preview-09-2025
API: Direct WebSocket connection to Live API
responseModalities: [“AUDIO”]

Observed Behavior: In a conversation with 5+ AI audio responses totaling approximately 60-90 seconds of spoken audio, I receive:

responseTokenCount: 380 (which at 32 tokens/sec = only ~12 seconds)
responseTokensDetails: [{modality: “AUDIO”, tokenCount: 380}]

how do I know how many tokens I am being billed for output? It isn’t very clear in google cloud console or ai studio (please guide me if I am wrong).

Pooja_Kapse · January 13, 2026, 12:21pm

Hi @Sanjay_Shreeyans_Jav ,
Thanks for reporting this issue.
To help us debug this issue, could you please provide minimal reproducible code.

Topic		Replies	Views
Token usage calculation with Google ADK and Gemini-2.5-flash-native-audio-dialog Gemini API api , audio , billing , google-adk	5	150	January 9, 2026
Pricing and usages for S2S (speech to speech) models Gemini API gemini , audio	5	115	November 28, 2025
Gemini Live API Reports Triple Prompt Token Consumption Gemini API gemini-api , live-streaming	3	190	January 6, 2026
Could someone help me understand gemini live pricing? Gemini API api , models , billing	1	337	June 23, 2025
Gemini-3-flash-preview not returning prompt audio tokens in usage metadata when given a video file with an audio track Gemini API bug , audio	4	70	January 8, 2026

Audio Token Counts Unexpectedly Low in Gemini Live API

Related topics