Gemini Live API (Native Audio): Response Latency Gradually Increases During Long Sessions

Summary

I am observing a consistent response latency degradation when using the Gemini Live API with native audio streaming. At the beginning of a session, the assistant responds almost instantly. However, as the conversation continues, the response time gradually increases even when user input length, speaking pace, and audio quality remain consistent.

This behavior occurs only with audio input. When the same prompt is sent as text input within the same session, responses remain fast and stable.

Restarting the session immediately resets the latency to a low level.


Observed Behavior

  1. Start a new Live session
    • Assistant replies with very low latency
  2. Continue the conversation using audio input
    • Response latency gradually increases over time
  3. After several turns
    • Assistant takes noticeably longer to respond
  4. Send the same message as text input (without restarting the session)
    • Response is returned in under ~1 second
  5. Restart the session
    • Latency resets and becomes fast again

This strongly suggests latency accumulation over the lifetime of the session.


Expected Behavior

Response latency should remain stable throughout a session, assuming similar input length, speaking behavior, and audio quality.


Models Tested

  • gemini-2.5-flash-native-audio-preview-12-2025

Session Initialization:

from livekit.agents.voice import AgentSession
from livekit.plugins import google

session = AgentSession(
    llm=google.beta.realtime.RealtimeModel(
        model="gemini-2.5-flash-native-audio-preview-12-2025",
        voice="Zephyr",
        temperature=0.2,
    ),
)

Audio is streamed continuously via LiveKit, and responses are generated using:

await session.generate_reply(
    instructions=prompt_text,
    allow_interruptions=True,
)

No explicit history replay, long system prompts, or custom context injection is performed during the session.


Additional Notes

  • The issue is reproducible across:
    • LiveKit Agent
    • LiveKit Agent Playground
    • Direct Gemini Live API usage
  • The slowdown does not appear to correlate with:
    • Network latency
    • Audio quality
    • User speech length
  • Restarting the session consistently resolves the issue.

Questions

  1. Is there a known limitation or internal behavior in Gemini native audio models where response latency increases as session context grows?
  2. Are there recommended best practices for:
    • Managing long-running audio sessions
    • Resetting or pruning context
    • Preventing response latency degradation over time

Any guidance or confirmation from the team would be greatly appreciated.