Gemini Live API Issues: 1008/1011 Disconnects, Per-Session Cost, Function Calling, API Logs

Joe_Hu · January 21, 2026, 4:36pm

Executive Summary

We are building an English speaking practice platform using Gemini Live API for real-time voice conversations. The API provides excellent audio quality and low latency. This report documents 6 issues we’ve encountered, our implemented workarounds, and requests for guidance.

Our Use Case

We use Gemini Live API (gemini-2.5-flash-native-audio-preview-12-2025) for real-time voice conversations where an AI examiner asks questions, listens to user responses, and provides spoken scoring feedback at the end.

Session Characteristics:

Duration: 10-14 minutes
Turns: 15-25 per session
User speaking: ~66% of session time
Full conversation context required for accurate scoring at session end

Connection Configuration:

ai.live.connect({
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  config: {
    responseModalities: [Modality.AUDIO],
    temperature: 0,
    speechConfig: {
      voiceConfig: { prebuiltVoiceConfig: { voiceName: "Aoede" } }
    },
    systemInstruction: examinerPrompt,  // ~2500 tokens
    generationConfig: {
      thinkingConfig: { thinkingBudget: 1024 }
    },
    inputAudioTranscription: {},
    outputAudioTranscription: {},
    realtimeInputConfig: {
      automaticActivityDetection: { disabled: true }  // Required for transcription flush
    },
    sessionResumption: {},
    tools: [{ functionDeclarations: scoringTools }]
  }
});

Issues Overview

#	Issue	Status	Request
1	Disconnects (1011/1008)	Mitigation implemented	Root cause, config guidance
2	Not Following Instructions	Three-layer workaround	`toolConfig` support, guidance
3	Repetition	UI workaround	VAD control
4	Transcription Stops (30s+)	Workaround implemented	Fix underlying issue
5	Per-Session Token/Cost	No solution	Token/cost API
6	No API Logs	Client-side logging	Enable logging

Issue 1: WebSocket Disconnects (1011/1008) - CRITICAL

What Happens

WebSocket connections close mid-conversation with codes 1008 (Policy Violation) and 1011 (Internal Error).

Error Messages Captured

From CloseEvent.reason:

// Code 1008 - Policy Violation
"Operation is not implemented, or supported, or enabled."

// Code 1011 - Internal Error
"Failed to run inference"
"Thread was cancelled"
"Thread was cancelled when writing StartStep status to channel.; Failed to close the streaming context; status = CANCELLED:"
"Internal error"
"Internal error encountered."
"RPC::DEADLINE_EXCEEDED"
"RESOURCE_EXHAUSTED"
"service is currently unavailable"

Production Example

Session disconnect at ~10 minutes:

{
  "closeCode": 1011,
  "closeReason": "Thread was cancelled when writing StartStep status to channel.; Failed to close the streaming context; status = CANCELLED:",
  "wasCleanClose": true,
  "goAwayReceivedAt": 1769009709175,
  "goAwayTimeLeftMs": 50000,
  "disconnectTimestamp": 1769009768884,
  "flushEvents": [
    {"type": "started", "timestamp": 1769009246295},
    {"type": "ended", "timestamp": 1769009247197},
    // 19 flush cycles over ~8 minutes (every 15 seconds)
    {"type": "started", "timestamp": 1769009739792},
    {"type": "ended", "timestamp": 1769009740693}
  ]
}

Observed Patterns

Pattern	Code	Frequency
Sessions 8-12 minutes	1011	Most common
During scoring phase	1008, 1011	Occasional
Mid-conversation	1008, 1011	Occasional

Our Implementation

Auto-Reconnection: Max 3 attempts with exponential backoff (1s, 2s, 4s), resumption token passed to new connection
Context Loss Detection: After reconnect, detect if context was lost by checking for examiner intro phrases → trigger fallback
Server-Side Scoring Fallback: Save transcript and use standard Gemini API for scoring (text-only)
GoAway Handling: Listen for goAway signals and use resumption tokens

Request

Root Cause Documentation: What causes codes 1008 and 1011? Are there limits on session duration, context size, or token count?
Configuration Guidance for 10+ Minute Sessions:
- Recommended configurations for 7-12 minute sessions?
- Should we proactively reconnect before the 10-minute mark?
- Optimal frequency for activityEnd/activityStart flushes?
Code 1008 Clarification: What operation triggers “Operation is not implemented, or supported, or enabled”?

Issue 2: Not Following Conversation Instructions - CRITICAL

What Happens

The model does not reliably follow conversation flow instructions:

Incomplete Spoken Output: Model instructed to speak 5 band scores aloud frequently stops after 1-2 scores or skips entirely
Function Calling: reportScoringResults function called in ~60-70% of sessions
Premature Responses: Model sometimes provides feedback before waiting for user response

Our Use Case

The AI examiner must:

Ask questions → wait for user response → repeat
After all questions, speak all 5 band scores aloud with explanations
Call reportScoringResults function to return structured data

Users require real-time spoken interaction. Only the Live API provides this capability.

Prompt Instructions

We use explicit instructions:

⚠️ CRITICAL: WAIT FOR FINAL ANSWER BEFORE CLOSING
After asking your FINAL question:
1. STOP TALKING completely
2. WAIT for the candidate's complete response (may be 30-60 seconds)
3. Only THEN proceed to the ending sequence

🚨 CRITICAL: You MUST speak ALL 5 scores below. DO NOT STOP after 1 or 2 scores.

Three-Layer Extraction Workaround

Since spoken output is not guaranteed, we extract scores through multiple fallback layers:

Function Calling (reportScoringResults): Primary method
Text Block Parsing: [BAND_SCORES]...[/BAND_SCORES] markers
Regex Extraction: Parse from spoken transcript

// Fallback regex patterns
/(?:i\s+give\s+you|you\s+(?:get|receive))\s*(?:a|an)?\s*(\d+(?:\.\d+)?)\s*(?:for\s*)?(?:fluency)/i
/(?:overall|your overall)\s*(?:band\s*)?(?:score)\s*(?:is|would be)\s*(\d+(?:\.\d+)?)/i

Key Limitation

toolConfig.functionCallingConfig with mode: ANY is not available in Live API:

// Standard Gemini API - supported
toolConfig: { functionCallingConfig: { mode: 'ANY' } }

// Live API - toolConfig not in LiveConnectConfig

Request

toolConfig Support: Add functionCallingConfig to Live API
Guidance for Instruction Following: How to improve reliability of:
- Speaking complete requested content
- Waiting for user response before proceeding
- Calling functions when instructed

Issue 3: Repetition - MEDIUM

What Happens

The AI examiner sometimes repeats itself unprompted, even in quiet environments.

Context

We disable automaticActivityDetection to implement the transcription flush workaround (Issue 4). This prevents us from tuning VAD sensitivity.

Our Workaround

UI reminder for users to use headphones or quiet environment.

Request

Ability to configure VAD sensitivity while keeping manual activity control (activityEnd/activityStart)
OR improved transcription that removes the need for periodic flush

Issue 4: Transcription Stops During Long Speech (30s+) - SOLVED

What Happens

inputTranscription events stop or degrade during continuous user speech longer than ~30 seconds.

Solution

Credit to the community:

Disable automatic activity detection:

realtimeInputConfig: {
  automaticActivityDetection: { disabled: true }
}

Send periodic flush every 15 seconds:

session.send({ clientContent: { turnComplete: false } }); // activityEnd
session.send({ realtimeInput: { activityStart: {} } });   // activityStart

Results

Same 2-minute continuous speech:

Without flush: 285 characters
With flush: 2,064 characters

References

Request

Fix the underlying transcription issue so that inputTranscription events are reliably delivered during continuous speech without requiring manual flush workarounds.

Issue 5: Per-Session Token/Cost Visibility - NEEDS FEATURE

What Happens

Cannot accurately track token consumption or calculate costs for individual Live API sessions.

Pricing (from documentation)

Component	Price per 1M tokens
Input text	$0.50
Input audio	$3.00
Output text	$2.00
Output audio	$12.00

Audio token rate: 32 tokens per second. Billing is per-turn for entire context window (cumulative).

Issues

Problem	Description
No per-session data	Cloud Console shows only aggregate usage, not per-session breakdown
Inconsistent `usageMetadata`	`promptTokensDetails` / `responseTokensDetails` appear intermittently in API responses

Request

Per-Session Token Consumption API: Query token usage (text + audio) for individual sessions
Consistent usageMetadata: Always populate modality breakdown
Cloud Console Visibility: Per-session token and cost data

Issue 6: No API Logs - NEEDS FEATURE

What Happens

Live API sessions do not appear in Gemini API Logs and Datasets in Google Cloud Console.

API Type	Visible in Logs?
Standard Gemini API	Yes
Gemini Live API	No

Impact

Cannot debug 1011 errors server-side
Cannot verify token counts
Cannot investigate function call failures
Cannot analyze audio processing issues

Our Workaround

Client-side logging to our database:

WebSocket close codes and reasons
Transcript events
Function call success/failure
Timing data

Request

Enable Live API Logging: Add to Gemini API Logs and Datasets
Per-Session ID: Correlation between our sessions and Google’s internal logs
Error Details: More information when 1011 occurs

Summary

#	Issue	Our Implementation	Request
1	Disconnects (1011/1008)	Auto-reconnection + scoring fallback	Root cause, config guidance
2	Not Following Instructions	Three-layer extraction	`toolConfig` support, guidance
3	Repetition	UI reminder	VAD control with manual activity
4	Transcription Stops (30s+)	Periodic flush workaround	Fix underlying issue
5	Per-Session Token/Cost	N/A	Token API, consistent `usageMetadata`
6	No API Logs	Client-side logging	Enable Live API logging

Contact

We can provide additional logs, sample sessions, or code samples to help investigate these issues.

icapora · March 7, 2026, 12:33pm

Hi again @Joe_Hu !
We were hitting the same 1008 errors. After investigation, we found the root cause was
a race condition in how we handled realtime input during tool calls.

Here’s what solved it for us:

Zero errors in testing so far with this approach.

I hope this helps!

Joe_Hu · March 7, 2026, 3:32pm

Thanks for your reply, I’m testing it now!

thilak_reddy · March 7, 2026, 10:04pm

Hi @icapora, I’ve implemented your suggested approach, and while it works perfectly for the 09-2025 model, I’m still encountering the 1008 error on the 12-2025 version

Cantemir_Mihu · March 9, 2026, 6:48am

I’m also still getting these errors on 12-2025 model after implementing the gate

Cantemir_Mihu · March 9, 2026, 7:08am

After some troubleshooting, I found the cause for websocket: close 1008 (policy violation): Operation is not implemented, or supported, or enabled.

The tools config was not passed correctly to the client…

Benjamin_Hughes · March 25, 2026, 10:12pm

I have implemented the gate and i am also still getting errors on the 12-2025 model

icapora · March 30, 2026, 8:41pm

I migrated to the new Gemini 3.1 Live model; it’s really promising and fixes many of the problems that version 2.5 had.

Joe_Hu · March 30, 2026, 8:55pm

Yes! I migrated it two days ago and it’s amazing!

Here is my new feedback: Gemini 3.1 Flash Live — Great upgrade from 2.5, two model behavior observations from production voice app

Topic		Replies	Views
WebSocket 1008 Error "Operation is not implemented" during Function Calling in Live API (gemini-2.5-flash-native-audio-preview-12-2025) Gemini API bug , api , models , gemini , gemini-flash-2-5	3	446	March 30, 2026
Gemini Live API WebSocket Error 1008: "Operation is not implemented, or supported, or enabled" Gemini API bug , api , gemini , function-calling	54	3429	May 4, 2026
Gemini Live API Random WebSocket Closures After Sendtoolresponse() Gemini API bug , api , gemini , live-streaming	2	386	December 17, 2025
Hard-Won Patterns for Building Voice Apps with Gemini Live (March 2026) Gemini API models , gemini-api , vertexai , live-streaming	1	416	March 20, 2026
Gemini Live API — gemini-2.5-flash-native-audio-preview-12-2025 returns code=1011 mid-turn at ~80% rate (started 2026-05-27) Gemini API ai-studio , bug , api , models , gemini-api	3	205	June 15, 2026

Gemini Live API Issues: 1008/1011 Disconnects, Per-Session Cost, Function Calling, API Logs

Executive Summary

Our Use Case

Issues Overview

Issue 1: WebSocket Disconnects (1011/1008) - CRITICAL

What Happens

Error Messages Captured

Production Example

Observed Patterns

Our Implementation

Request

Issue 2: Not Following Conversation Instructions - CRITICAL

What Happens

Our Use Case

Prompt Instructions

Three-Layer Extraction Workaround

Key Limitation

Request

Issue 3: Repetition - MEDIUM

What Happens

Context

Our Workaround

Request

Issue 4: Transcription Stops During Long Speech (30s+) - SOLVED

What Happens

Solution

Results

References

Request

Issue 5: Per-Session Token/Cost Visibility - NEEDS FEATURE

What Happens

Pricing (from documentation)

Issues

Request

Issue 6: No API Logs - NEEDS FEATURE

What Happens

Impact

Our Workaround

Request

Summary

Contact

Related topics