Live API : 5-6 second Response Latency

@Logan_Kilpatrick @Mustan_lokhand

Hi team,

I’m seeing consistently high response latency with Gemini Live API in a production-like voice simulation setup.

Use case

We use Gemini as a simulated user/caller to test a target voice agent (target agent speaks first).
Flow is: Twilio Media Stream → Server → Gemini Live WS → Server → UI.

Model

gemini-2.5-flash-native-audio-preview-12-2025

What we are already doing (per docs / best practices)

  • Input audio sent as 16-bit PCM, 16kHz, mono (audio/pcm;rate=16000)

  • Output audio handled at 24kHz

  • Audio sent in 20ms chunks (within recommended 20–40ms)

  • Long-lived websocket session (no reconnect per turn)

  • Ordered ingest/serialized dispatch (no parallel frame processing)

  • Minimal client buffering (small startup buffer only)

  • Local/manual VAD with explicit activityStart / activityEnd

  • Manual silence threshold around 510ms

  • We log:

    • activityEnd requested

    • activityEnd sent

    • queueDrainMs

    • geminiProcessingMs

    • first response audio timestamp

Observed behavior

  • End-to-end perceived latency often 5–6s, sometimes higher on first turn

  • In many turns, local queue drain is now low (often ~100–300ms), but geminiProcessingMs is often ~2.2–3.8s and sometimes ~5.4s

  • Example first turn from logs:

    • queuedChunksAtActivityEnd: 83

    • queueDrainMs: 1654ms

    • geminiProcessingMs: 2245ms

  • Another run showed geminiProcessingMs around 5423ms even with very low queue drain

Questions

  1. Is this latency profile expected for this model in speech-to-speech proxy architecture?

  2. Are there known server-side factors (region, context growth, session duration, model settings) that cause 5s+ spikes?

  3. Any recommended tuning for manual VAD + telephony streams beyond what we already do?

we observed this latency initially with automatic vad, but then moved to manual vad for explict control of response latency. but was surprised the long latency is still persists and it was actually gemini taking 4/5 sec + time to respond back which is unacceptable. can anyone please help here?