Root Cause
When Gemini sends a tool_call, the session enters a state where it rejects all sendRealtimeInput calls — including audio, activityStart, activityEnd, and audioStreamEnd. If your client continues streaming microphone audio or sending activity signals while the tool call is pending, the server responds with error 1008 (policy violation) and closes the WebSocket.
This happens because there’s a race condition between:
-
Your audio input task (continuously streaming mic data)
-
Your response processing task (which receives the tool_call)
The tool_call arrives on the response stream, but by the time your code processes it, the audio task may have already sent another sendRealtimeInput frame.
Fix: Gate all realtime input during tool calls
The solution is simple: add a boolean flag that blocks all sendRealtimeInput calls between receiving a tool_call and sending the corresponding sendToolResponse.
Python (google-genai SDK)
class SessionManager:
def __init__(self):
self._tool_call_pending = False
async def handle_tool_call(self, tool_call):
self._tool_call_pending = True # Block all realtime input
try:
for fc in tool_call.function_calls:
result = await execute_tool(fc.name, fc.args)
response = FunctionResponse(
id=fc.id, name=fc.name, response=result
)
await session.send_tool_response(
function_responses=[response]
)
except Exception:
self._tool_call_pending = False
raise
finally:
self._tool_call_pending = False # Re-enable realtime input
async def send_audio(self, audio_data):
if self._tool_call_pending:
return # Skip — Gemini rejects input during tool calls
await session.send_realtime_input(audio=audio_data)
async def send_activity_start(self):
if self._tool_call_pending:
return
await session.send_realtime_input(activity_start={})
async def send_activity_end(self):
if self._tool_call_pending:
return
await session.send_realtime_input(activity_end={})
JavaScript / TypeScript (@google/genai SDK)
class SessionManager {
private toolCallPending = false;
async handleToolCall(toolCall: LiveServerToolCall) {
this.toolCallPending = true;
try {
for (const fc of toolCall.functionCalls) {
const result = await executeTool(fc.name, fc.args);
await session.sendToolResponse({
functionResponses: [{ id: fc.id, name: fc.name, response: result }],
});
}
} catch (e) {
this.toolCallPending = false;
throw e;
} finally {
this.toolCallPending = false;
}
}
async sendAudio(audioData: ArrayBuffer) {
if (this.toolCallPending) return;
await session.sendRealtimeInput({ audio: audioData });
}
async sendActivityStart() {
if (this.toolCallPending) return;
await session.sendRealtimeInput({ activityStart: {} });
}
async sendActivityEnd() {
if (this.toolCallPending) return;
await session.sendRealtimeInput({ activityEnd: {} });
}
}
Important notes
-
Gate ALL realtime input types — not just audio.
activityStart,activityEnd, andaudioStreamEndare also rejected during the tool call window. -
Also store tool names —
FunctionResponserequires bothidANDname. Store a mapping oftool_call.id → tool_call.namewhen you receive the call, since the ID alone isn’t sufficient for the response. -
Use try/finally — Always clear the flag in a
finallyblock to avoid permanently blocking input if the tool execution throws. -
This is a client-side workaround — The underlying issue is server-side (the API should queue or ignore realtime input during tool calls, not terminate the connection). But until Google fixes it, this workaround is stable in production.
Results
We’re currently testing this workaround and have had zero 1008 errors in our testing so far. We’re actively validating it with various tool calling scenarios before wider deployment.