Gemini 3.1 Flash Live — audio input via WebSocket never triggers a response

I’m building a voice tutoring app using Gemini 3.1 Flash Live Preview via raw WebSocket. Text input works perfectly — when I send a clientContent message, Gemini responds with audio. But when I send audio input via realtimeInput.audio, Gemini never responds. I only see sessionResumptionUpdate messages, never modelTurn or turnComplete.

Setup message:

json

{
  "setup": {
    "model": "models/gemini-3.1-flash-live-preview",
    "generation_config": {
      "response_modalities": ["AUDIO"],
      "speech_config": {
        "voice_config": {
          "prebuilt_voice_config": { "voice_name": "Leda" }
        }
      }
    },
    "system_instruction": { "parts": [{ "text": "..." }] },
    "realtime_input_config": {
      "automatic_activity_detection": { "disabled": true }
    }
  }
}

Audio message:

json

{
  "realtimeInput": {
    "audio": {
      "data": "<base64 PCM>",
      "mimeType": "audio/pcm;rate=16000"
    }
  }
}

Audio source: Browser AudioContext at 16kHz, AudioWorklet converting float32 to int16 PCM, chunks of 320 samples (~20ms). Amplitude confirmed above noise floor.

Question: What is the correct format to send audio input that will trigger a spoken response from Gemini 3.1 Flash Live via raw WebSocket? Thank you!