Live API Input Transcription Stops After ~100 Chunks Despite Audio Processing Continuing

I’m experiencing unexpected behavior with input_audio_transcription in the Live API (gemini-live-2.5-flash-preview, I also tried gemini-2.5-flash-preview-native-audio-dialog). When a user speaks for extended periods (say >15 seconds), the input_transcription stream stops sending chunks after approximately 100 chunks, even though:

1. Audio is still being captured and sent to the Live API

2. The AI continues to process all audio correctly - when I ask the AI about

content spoken after transcription stopped, it recalls everything accurately

3. Only the transcription stream stops - not the underlying audio processing. Since the input transcript stops, it feels like a delay before AI responses.

The output transcription works perfectly.

Observed Behavior:

  • Transcription chunks flow normally for first ~100 chunks (~10 seconds)

  • Then response.server_content.input_transcription stops appearing in responses

  • Audio continues to be processed (proven by AI’s accurate responses)

  • After user finishes speaking, AI responds correctly using ALL spoken content

Environment

  • Model: gemini-live-2.5-flash-preview. I also tried gemini-2.5-flash-preview-native-audio-dialog. Similar thing happened.

  • Audio Format: PCM 16-bit, 16kHz mono

  • Audio Chunk Size: ~5KB per WebSocket send (buffered from 8x 1024-sample chunks)

Code Sample

Configuration:

config = {

  "response_modalities": \["AUDIO"\],

  "system_instruction": base_prompt,  # \~200 lines

  "output_audio_transcription": {},

  "input_audio_transcription": {},

}

async with client.aio.live.connect(model=“gemini-live-2.5-flash-preview”,

config=config) as session:

  \# ... audio input/output handlers

Audio Output Handler (receives responses from AI):

async def handle_audio_output():

  while session_active:

      turn = session.receive()

      async for response in turn:

          \# AI's audio output - works fine

          if response.data:

              await websocket.send(audio_data)

          \# AI's transcription - works fine

          if response.server_content and

response.server_content.output_transcription:

              ai_text = response.server_content.output_transcription.text

              logger.info(f"\[AI\] {ai_text}")

          \# USER's transcription - STOPS after \~100 chunks

          if response.server_content and

response.server_content.input_transcription:

              user_text = response.server_content.input_transcription.text

              transcription_chunk_count += 1

              logger.info(f"\[USER\] Chunk #{transcription_chunk_count}: 

{user_text}")

              \# Stops logging after chunk \~100, but audio keeps flowing

Observed Log Pattern:

[USER] Chunk #1: " A"

[USER] Chunk #2: " use"

[USER] Chunk #3: " Go"

[USER] Chunk #98: " of"

[USER] Chunk #99: " fun"

[USER] Chunk #100: " for"

[USER] Chunk #101: " us"

# … then nothing, even though user continues speaking for 20+ more seconds

# AI responds accurately to ALL content including post-chunk-100 speech

I wonder if this is an expected behavior of Live API or I am doing something wrong in my code. It feels to me there’s a magic knob somewhere that can fix this issue with a simple turn. Any suggestions / helps are greatly appreciated.

1 Like

we’re also experiencing this issue. have you figured out a fix or can anyone at Google provide additional context?

I didn’t find a fix. I switched to browser’s Web Speech API for input transcript.