Gemini Live API — gemini-2.5-flash-native-audio-latest / preview-12-2025 returns code=1011 mid-turn at ~80-90% rate (started 2026-05-27 ~17:00 UTC)
Endpoint: wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent
Model: gemini-2.5-flash-native-audio-latest (and gemini-2.5-flash-native-audio-preview-12-2025)
SDK: Direct WebSocket, no SDK layer
Severity: Production users seeing WebSocket disconnects on simple text turns.
Summary
Live API turns on gemini-2.5-flash-native-audio-latest and the underlying preview-12-2025 close the WebSocket with code=1011, reason="Internal error encountered." at ~80-90% rate. Both clientContent (text) and realtimeInput.audio (audio) request paths are affected. Behavior started 2026-05-27 between 17:11 and 18:00 UTC and has been continuous since.
Same setup payload bytes against gemini-2.5-flash-native-audio-preview-09-2025 is 100% clean (N=20 text, N=5 audio).
Timeline
| Window (UTC) | Observation |
|---|---|
| Through 2026-05-27 17:11 | ~1% 1011 rate over ~1500 turns. Recovered by single retry with sessionResumption.handle. |
| 2026-05-27 ~18:00 → present | Same client, same payload, same key — 80-90% 1011 rate per turn. |
Reproduction
Standalone Node.js + ws, no SDK. Production setup payload (model, generationConfig, contextWindowCompression, sessionResumption, realtimeInputConfig, systemInstruction, tools, audio transcription) + one clientContent turn:
import WebSocket from 'ws';
const URL = `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=${process.env.GEMINI_API_KEY}`;
const MODEL = process.env.MODEL || 'gemini-2.5-flash-native-audio-latest';
const ws = new WebSocket(URL);
ws.on('open', () => {
ws.send(JSON.stringify({
setup: {
model: `models/${MODEL}`,
generationConfig: {
responseModalities: ['AUDIO'],
speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Puck' } } },
thinkingConfig: { thinkingBudget: 0 },
},
contextWindowCompression: { triggerTokens: 25000, slidingWindow: { targetTokens: 12500 } },
sessionResumption: { handle: null },
realtimeInputConfig: { automaticActivityDetection: { disabled: true } },
systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] },
inputAudioTranscription: {},
outputAudioTranscription: {},
},
}));
});
ws.on('message', (data) => {
const msg = JSON.parse(data.toString());
if (msg.setupComplete) {
ws.send(JSON.stringify({
clientContent: {
turns: [{ role: 'user', parts: [{ text: 'What is the SSI resource limit?' }] }],
turnComplete: true,
},
}));
}
});
ws.on('close', (code, reason) => console.log(`closed code=${code} reason="${reason}"`));
Results (2026-05-27 17:42 UTC, N=10):
| Model | OK | 1011 |
|---|---|---|
gemini-2.5-flash-native-audio-latest |
1/10 | 9/10 |
gemini-2.5-flash-native-audio-preview-12-2025 |
1/5 | 4/5 |
gemini-2.5-flash-native-audio-preview-09-2025 |
20/20 | 0/20 |
Audio-input path (same setup payload, realtimeInput.audio PCM 16kHz s16le, activityStart/activityEnd) on -latest: 4/5 → 1011, same reason="Internal error encountered." On 09-2025: 5/5 OK.
Wire-level pattern
Failure case (typical):
client → setup
server → setupComplete (≈200-400ms)
client → clientContent { turns: [...], turnComplete: true }
server → sessionResumptionUpdate { } (empty — no newHandle)
server → close code=1011 "Internal error encountered." (≈200-3000ms later)
Success case (rare on -latest, always on 09-2025):
... setupComplete → clientContent →
server → sessionResumptionUpdate { } → toolCall → toolResponse → serverContent.modelTurn (audio) → ... → turnComplete
Failure point is always between clientContent (or realtimeInput.audio) and the first serverContent.modelTurn. No content emitted before the close.
Audio path failure on -latest is identical except clientContent is replaced by realtimeInput.audio chunks + activityEnd.
What we need
- Confirmation of the regression in
gemini-2.5-flash-native-audio-preview-12-2025. - Either restore reliability on
12-2025, or repoint thegemini-2.5-flash-native-audio-latestalias to a backend that is not failing.
Environment
- Endpoint:
generativelanguage.googleapis.com/ws/.../BidiGenerateContent(v1beta) - Auth: API key (
?key=...) - Region: default routing
- Live API quota (per AI Studio): 8/Unlimited RPM, 12K/1M TPM, 140/Unlimited RPD — well under limits
- Reproduced from two US-West network egress points (developer machine + cloud deployment)