Gemini Live Audio WebSocket closes immediately after opening

Hello,

We are trying to set up a live audio dialog with a Gemini model. Below is our configuration and the issue we are facing.

Our Configuration:

  • Endpoint: wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

  • Model: gemini-2.5-flash-native-audio-dialog

  • Authentication: OAuth 2.0 via a service account (with Vertex AI User role). We successfully obtain a valid Bearer token.

  • Setup Message (sent immediately after WebSocket opens):

json

{
  "setup": {
    "model": "gemini-2.5-flash-native-audio-dialog",
    "generationConfig": {
      "responseModalities": ["AUDIO"]
    },
    "systemInstruction": {
      "parts": [
        { "text": "You are a French as a Foreign Language teacher (FLE). Conduct a natural oral DELF B1 conversation." }
      ]
    }
  }
}

The Problem:

  1. :white_check_mark: Token is obtained successfully.

  2. :white_check_mark: The WebSocket connection opens (onopen fires).

  3. :cross_mark: The connection is closed immediately by the server, either before our setup message is sent or right after. We receive no error code, reason, or any diagnostic message from the Gemini server. The connection simply dies.

Our Questions:

  1. Which Google model is currently stable and functional for live, two-way audio dialog? Is the model we are using (gemini-2.5-flash-native-audio-dialog) correct and supported?

  2. Is our endpoint correct for this model?

  3. Is our setup message format correct? Does the model require a specific audio format or a different initial interaction?

  4. Why is the connection closing without any explanation? How can we get proper diagnostics or logs from the server side to understand the root cause?

We need a working configuration for a production live audio application. Please advise on the correct setup.

Thank you.