Hi everyone,
I’m working with the Gemini Live API and I’m noticing an issue with input_audio_transcription events during long user responses. After a certain period of continuous speech, the transcription events either start arriving with a noticeable delay or sometimes stop coming altogether.
Before assuming it’s a client-side problem, I wanted to check with the community:
-
Is this a known or common behavior when handling long audio segments with the Live API?
-
Could this be related to any specific configuration (e.g., VAD settings, turn coverage, chunking limits, max tokens, or audio streaming options)?
-
Are there recommended best practices to ensure stable and continuous
input_audio_transcriptiondelivery during long responses?
Any insights or shared experiences would be really helpful.
Thanks!