Gemini Live API: Delays or Missing input_audio_transcription Events

Hi everyone,

I’m working with the Gemini Live API and I’m noticing an issue with input_audio_transcription events during long user responses. After a certain period of continuous speech, the transcription events either start arriving with a noticeable delay or sometimes stop coming altogether.

Before assuming it’s a client-side problem, I wanted to check with the community:

  • Is this a known or common behavior when handling long audio segments with the Live API?

  • Could this be related to any specific configuration (e.g., VAD settings, turn coverage, chunking limits, max tokens, or audio streaming options)?

  • Are there recommended best practices to ensure stable and continuous input_audio_transcription delivery during long responses?

Any insights or shared experiences would be really helpful.
Thanks!