Gemini-2.5-flash-native-audio-preview with manual VAD (disabled: True) - Gemini never responds after ActivityEnd, session dies with 1011 keepalive ping timeout

olaniyi_george · April 20, 2026, 1:38pm

Model: gemini-2.5-flash-native-audio-preview-12-2025 SDK: Python google-genai, WebRTC audio bridge via aiortc VAD mode: Manual -automatic_activity_detection: { disabled: True } Thinking: thinking_budget: 0 Tools: Yes - 4 custom function declarations

Setup

We’re building an AI interview platform over WebRTC. The audio pipeline is:

Browser mic → aiortc WebRTC track → resampled 48kHz→16kHz → send_realtime_input(audio=...)
We disabled auto-VAD and implemented our own amplitude-based VAD on the server
On speech start: send_realtime_input(activity_start=ActivityStart())
On speech end: send_realtime_input(activity_end=ActivityEnd())

The Gemini session opens, the agent greets the candidate (turn_complete fires correctly), and then the candidate speaks. After this the session consistently dies with:

websockets.exceptions.ConnectionClosedError: sent 1011 (internal error) keepalive ping timeout; no close frame received

Two failure modes we’ve observed

Mode A: VAD fires speech END, ActivityEnd is sent, but Gemini never responds. Audio chunks stop, no turn_complete ever arrives, 1011 after ~10–12 seconds.

Mode B (more common): VAD speech START fires, audio streams for 30–60 seconds, but VAD speech END NEVER fires because when the candidate stops talking, track.recv() times out (no frames) rather than delivering silence - so our VAD silence counter never accumulates enough to trigger ActivityEnd. Gemini sits waiting. 1011 after ~10–12 seconds.

Log excerpt (Mode B - no speech END)

14:05:43 [GeminiLive] turn_complete — agent greeted candidate
14:05:44 [Bridge] VAD unlocked — mic ready
14:05:45 [Bridge] VAD: speech START
14:05:45 [Connection] Candidate activity START → send_activity_start() called
... 1900 chunks sent over ~38 seconds ...
14:06:23 last audio chunk logged
(11 seconds of nothing)
14:06:34 [Agent] ERROR: sent 1011 (internal error) keepalive ping timeout

No VAD: speech END log. No ActivityEnd ever sent. Gemini waits forever.

What we’ve tried / discovered

Removed send_audio_stream_end() after ActivityEnd - we had been calling it as a “flush” after ActivityEnd, based on a community suggestion. Per the Vertex AI reference docs, “An AudioStreamEnd isn’t sent in this configuration. Instead, any interruption of the stream is marked by an ActivityEnd message.” Removing it didn’t fully resolve the issue.
Draining the inbound queue before ActivityEnd - implemented a queue drain before sending ActivityEnd to prevent post-boundary audio from reaching Gemini.
Track-timeout silence counting - modified our VAD to count WebRTC track timeouts as silence accumulation, so speech END fires even when track.recv() stops delivering frames. This partially helps Mode B.
_activity_ended flag on send_audio - blocks the audio send loop from sending chunks after ActivityEnd.

I will appreciate any guidance with these;

Is ActivityEnd guaranteed to trigger model inference, or are there conditions under which Gemini ignores it? We’re seeing cases where it’s sent cleanly (queue drained first, no post-boundary audio) and Gemini still doesn’t respond.
Does audio arriving in the ~50–200ms after ActivityEnd corrupt the turn boundary? The genai SDK serialises messages but our audio loop runs concurrently — is there a race at the SDK’s websocket layer?
Is there a known issue with gemini-2.5-flash-native-audio-preview-12-2025 and manual VAD mode?
What is the recommended pattern for guaranteeing ActivityEnd is the last message Gemini receives before inference? The SDK’s send_realtime_input for audio and for activity signals appear to go through different code paths - is there a flush/sync mechanism?

Mustan_lokhand · April 21, 2026, 7:43pm

hi @olaniyi_george can you DM me your project number ?

olaniyi_george · April 24, 2026, 5:43pm

Hi @Mustan_lokhand, I have sent our project number to your dm. Do you have any insights you can share with us now that we can work with while working on a fix for this bug?

Topic		Replies	Views
Gemini Live (manual VAD, WebRTC): 1011 keepalive timeout after ActivityEnd — is inference guaranteed to trigger? Gemini API python	0	79	April 29, 2026
Gemini Live Assistant sent 1011 (internal error) keepalive ping timeout during real-time audio streaming Gemini API gemini-api , python	3	447	April 28, 2026
Manual Activity Detection: Second turn not processed - am I missing something? Gemini API api , gemini , live-streaming	0	78	February 9, 2026
Gemini Live API — gemini-2.5-flash-native-audio-preview-12-2025 returns code=1011 mid-turn at ~80% rate (started 2026-05-27) Gemini API ai-studio , bug , api , models , gemini-api	3	175	June 15, 2026
Repeated 1011 Internal error encountered on gemini-2.5-flash-native-audio-preview-12-2025 mid-session (with session resumption) Gemini API bug , live-streaming	2	176	April 22, 2026

Gemini-2.5-flash-native-audio-preview with manual VAD (disabled: True) - Gemini never responds after ActivityEnd, session dies with 1011 keepalive ping timeout

Related topics