Scheduling: "SILENT" in NON_BLOCKING function response not preventing duplicate audio generation

Hi,

I’m building a voice assistant using the Gemini Live API (BidiGenerateContent) and encountering an issue with NON_BLOCKING functions and the scheduling: “SILENT” parameter.

Setup:

  • Model: gemini-2.5-flash-native-audio-preview-12-2015

  • Function definition with “behavior”: “NON_BLOCKING”:

    {

    “name”: “play_animation”,

    “description”: “Play a robot animation”,

    “behavior”: “NON_BLOCKING”,

    “parameters”: { … }

    }

Scenario:

  1. User says: “Hello”

  2. Model responds with “Hello! What can I help you with today?” and simultaneously triggers a play_animation function call (to wave hello)

  3. We send the function response with scheduling: “SILENT”:

    {

    “toolResponse”: {

    "functionResponses": \[{
    
      "id": "...",
    
      "name": "play_animation",
    
      "response": {
    
        "result": {"status": "started"},
    
        "scheduling": "SILENT"
    
      }
    
    }\]
    

    }

    }

  4. Model speaks “Hello! What can I help you with today?” again

Result:

The same response is spoken twice, and both audio streams arrive within the same turnComplete. The transcript shows:

“Hello! What can I help you with today?Hello! What can I help you with today?”

Timeline from logs:

13:20:31.153 - Audio chunks start (“Hello! What can I help you with today?”)

13:20:31.553 - toolCall received (play_animation)

13:20:31.573 - Sent toolResponse with scheduling: “SILENT”

          \[\~1.5 second gap\]

13:20:33.069 - More audio chunks arrive (same message repeated!)

13:20:34.795 - turnComplete received

Expected Behavior:

With scheduling: “SILENT”, the model should silently acknowledge the function result without generating any follow-up audio. The first “Hello! What can I help you with today?” should be the only response.

Actual Behavior:

The model generates the same audio response twice, suggesting it either:

  1. Ignores scheduling: “SILENT” for NON_BLOCKING functions

  2. Has already queued/generated the second response before receiving our SILENT response

Question:

Is this expected behavior? How can I prevent the model from generating duplicate audio after a NON_BLOCKING function call when the initial response already contains the intended message?

Environment:

  • WebSocket API: wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContent

  • Platform: Android (Kotlin)

Thank you for any guidance!

I conducted another test where I completely skipped sending the tool response for the play_animation function to see if the duplicate audio was triggered by our tool response.

Result: The duplicate audio was still generated, even without any tool response being sent.

Log evidence:

13:58:01.117 Google: Tool call - play_animation

13:58:01.142 [Our app] Skipping tool response (no sendGoogleToolResult called)

13:58:02.751 Google: Audio chunk received, 46080 bytes ← Second response starts

13:58:05.766 Google output: “Hello! It’s great to meet you. How can I help you today?Hello! It’s great to meet you. How can I help you today?”

Conclusion: This confirms the duplicate audio is generated automatically by the Live API for NON_BLOCKING functions, regardless of whether:

  1. We send a tool response with scheduling: “SILENT”

  2. We send a tool response without scheduling

  3. We don’t send any tool response at all

The model appears to automatically generate a follow-up response in parallel when executing a NON_BLOCKING function, and there seems to be no way to prevent this from the client side.

I had the same issue.

Hi @Ei_Kyaw, @Jean-Marc_Gourier ,

Welcome to the community! Apologies for the late response.

I can see you have attached a log, but to understand the issue better, could you please share the code snippet (the function definition part) and the exact prompt you sent the model? I can see you sent ‘Hello‘ to the model, but just wanted to confirm if there is any specific prompt this is happening with!

Thank you!

You can find the code definition in my public github: GitHub - studerus/pepper-android-realtime-chat: Open-source Android framework for low-latency, LLM-driven multimodal interaction on Pepper. Uses end-to-end speech-to-speech models and extensive Function Calling for agentic robot control (navigation, gaze, vision, touch). Also runs on regular Android devices.

It happens regardless of the prompt at the beginning of the conversation.

I have exactly the same error.