Audio Token Counts Unexpectedly Low in Gemini Live API

## Issue

Using `gemini-2.0-flash-live-001` via LiveKit. 2-minute voice conversation shows:

- Audio input: **3 tokens** (seems too low)

- Audio output: **0 tokens** (agent is speaking!)

- Text tokens: 13,521 input(normal,system prompt), 74 output (I set audio output, should have been zero here.)

## Questions

1. Is 3 audio input 3 chunks of audio rather?

2. Why 0 audio output tokens when audio is playing?

3. Do text output tokens (74) represent audio output actually, but still too small.

4. What’s expected for a 2-min voice conversation?

Need to understand this for accurate user billing.

## here is my Code snippet:

```python

# Model setup

google.beta.realtime.RealtimeModel(

model=“gemini-2.0-flash-live-001”,

voice=“Leda”,

input_audio_transcription=AudioTranscriptionConfig(),

output_audio_transcription=AudioTranscriptionConfig(),

)

# Metrics

@session.on(“metrics_collected”)

def _on_metrics_collected(ev):

inp = ev.metrics.input_token_details

out = ev.metrics.output_token_details

# inp.audio_tokens = 3, out.audio_tokens = 0

```I really appreciate whoever can look into this and clarify things up. I have been troubled for quite a while and seeking answers around in vain.