Gemini Live Flash 3.1 API: inputTranscription no longer streams incrementally

For this, you have two options:

  • Use a client-side VAD detection system that sends audio to track the start and end points.

  • Rely on transcription input and start generating a time loop between when the agent speaks and when you receive the transcription input from Gemini. You can also identify the data sent to Gemini.

Client-side VAD detection is viable and not very complex.