[Live API] gemini-live-2.5-flash-native-audio returns no output after setupComplete; gemini-2.0-flash-live-001 not accessible

Summary

Vertex AI Live API (bidiGenerateContent) is not functioning for our project on any of the available Live models. We have a working voice agent that broke between 2026-04-28 evening and 2026-04-29 morning.

Project details

Region tested: us-central1 (also us-east5, us-east1, us-west1, europe-west1, europe-west4, asia-northeast1 — same behavior)

  • Service Account: vertex-express@selma-voice-prod.iam.gserviceaccount.com
  • IAM role: roles/aiplatform.user (Vertex AI User) — granted today, verified to take effect (text-only gemini-2.5-flash:generateContent returns 200 OK in this project)
  • Vertex AI API: Enabled
  • Billing: Active (paid, since 2026-04-26)
  • SDK: google-genai Python 1.73.1 and Node.js 1.50.1
  • API version: v1beta1

Use case

Real-time voice secretary that talks to users in Russian/Ukrainian over WebSocket via client.aio.live.connect(). On 2026-04-28 it was working with gemini-2.0-flash-live-001 end-to-end with ~1.4 s TTFB. By 2026-04-29 morning it stopped — different symptom on different models.

Issue 1 — gemini-2.0-flash-live-001 returns 1008 not-found

client.aio.live.connect(model='gemini-2.0-flash-live-001', config={...})
→ APIError 1008: Publisher Model 'projects/selma-voice-prod/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-001' was not found

Reproduced in 7 regions with identical message. Same for gemini-2.0-flash-live-preview-04-09, gemini-2.0-flash-live, gemini-2.0-flash-live-preview. Text-only gemini-2.5-flash:generateContent works (200 OK), so this is specific to Live API access for 2.0 models.

Issue 2 — gemini-live-2.5-flash-native-audio connects but produces 0 server output

This is the only Live API model that successfully establishes a session for our project.

Reproduction config:

{
  "response_modalities": ["AUDIO"],
  "system_instruction": "You are a helpful assistant. Reply briefly in Russian.",
  "input_audio_transcription": {},
  "output_audio_transcription": {},
  "speech_config": {
    "voice_config": {"prebuilt_voice_config": {"voice_name": "Aoede"}},
    "language_code": "ru-RU"
  }
}

Observed:

  • WebSocket connect: :white_check_mark: ~200 ms
  • setupComplete received: :white_check_mark:
  • Client → server PCM 16-bit 16 kHz, 30 ms frames, audio/pcm;rate=16000, 121 chunks, 116976 bytes (3.6 s of Russian speech): :white_check_mark:
  • Server → client: 0 modelTurn, 0 inputTranscription, 0 outputTranscription, 0 turnComplete, 0 generationComplete for ≥30 s before timeout/disconnect.

Reproduced in 5+ standalone Python smoke runs today on Vertex (SA auth) and AI Studio (api_key) — same silent behavior. Tried with/without inputAudioTranscription, with/without speechConfig.languageCode, with/without prebuiltVoiceConfig, with/without speech audio (silent vs active). All produce zero serverContent.

Issue 3 — gemini-2.5-flash on Live → “is not supported”

client.aio.live.connect(model='gemini-2.5-flash', ...)
→ APIError 1007: gemini-2.5-flash is not supported in the live api

Confirms there is no working Live API model in our project — only gemini-live-2.5-flash-native-audio, and it does not produce server output.

What we need

  1. Confirm whether gemini-2.0-flash-live-001 is still GA for new projects, and if so — restore access for selma-voice-prod. If globally deprecated, please publish that and recommend a stable Live replacement that supports asynchronous tool calling.

  2. Investigate why gemini-live-2.5-flash-native-audio yields zero serverContent in our project despite a clean WebSocket setup and steady client → server PCM stream.

  3. Recommend a stable, GA Live API model on Vertex AI for production with Russian/Ukrainian audio and asynchronous tool calling. SDKs we use: google-genai Python 1.73.1, Node.js 1.50.1.

Happy to share the standalone smoke script (smoke_vertex.py), the test PCM audio, and full session logs on request.

Thanks,
Igor Sokhinov

Were you able to to figure out why this happens and any workaround. We see the same thing in our automated testing suite today. There was no traffic in prod at the time, so I’m not sure if its widespread.

From our team:
Same silent failure on gemini-2.5-flash-native-audio-preview-12-2025, May 2026

Reproducing what others have reported. Bidi WebSocket to
wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent. Single text turn,
responseModalities:[“AUDIO”], voice Puck, automaticActivityDetection:{disabled:true}, sessionResumption:{handle:null}, outputAudioTranscription:{}.

Wire-out:
{“setup”:{ …native-audio-preview-12-2025… }}
→ setupComplete received
{“clientContent”:{“turns”:[{“role”:“user”,“parts”:[{“text”:“What is my name? Say just the name.”}]}],“turnComplete”:true}}

Wire-in on a failing turn (3 events total, ~600ms apart):

  1. {“sessionResumptionUpdate”:{}}
  2. {“serverContent”:{“generationComplete”:true}}
  3. {“serverContent”:{“turnComplete”:true}, “usageMetadata”:{}}

No outputTranscription, no modelTurn, no audio chunks. usageMetadata is empty — zero promptTokenCount, so the prompt was never billed/processed.

Wire-in on a passing turn (same exact wire-out, same session config):

  1. {“sessionResumptionUpdate”:{}}
  2. {“serverContent”:{“outputTranscription”:{“text”:“Zephyr”}}}
  3. {“serverContent”:{}} (audio chunk)
  4. {“serverContent”:{“generationComplete”:true}}
  5. {“serverContent”:{“turnComplete”:true}, “usageMetadata”:{“promptTokenCount”:5409,“responseTokenCount”:19,…}}

Pattern:

  • Bursty in time. 3 fails in a 4-minute window today, then 50+ consecutive successes immediately after on the same code path with byte-identical
    wire-out.
  • A fresh WS reconnect with brand-new setup right after the failure also returns the same empty 3-event pattern. So it’s not stale-session state on
    the client side.
  • WS close code is normal (1005/1000), not 1011. No error event on the wire — the failure is “silent” in the protocol sense.

Asks for the team:

  1. What does empty usageMetadata + immediate generationComplete with no modelTurn mean? It’s not documented as a valid response.
  2. Is this what surfaces on the dashboard as 409 Conflict, or something else?
  3. Can the protocol return an error event for this case instead of an empty turn so clients can distinguish it from “model legitimately produced
    nothing”?

Confirming the same silent failure on our voice production path (Selma — agentic voice agent over Vertex AI Live API, gemini-live-2.5-flash-native-audio).

Pattern matches @flounder’s report exactly:

  • Bidi WebSocket via Vertex us-central1-aiplatform.googleapis.com Live API.
  • responseModalities:[“AUDIO”], voice Aoede, automaticActivityDetection:{disabled:true}, outputAudioTranscription:{}, sessionResumption:{handle:null}.
  • On a failing turn: sessionResumptionUpdategenerationCompleteturnComplete with empty usageMetadata{}. No modelTurn, no audio, no transcription. promptTokenCount=0.
  • WS close 1005, no error event on the wire.
  • Bursty: a short window of silent failures, then minutes of clean 100% success on identical config / byte-identical wire-out.
  • Fresh connection with brand-new setup right after the failure reproduces the same empty 3-event pattern, so it’s not stale client state.

We see this on both the AI Studio surface (generativelanguage.googleapis.com) and Vertex AI (us-central1-aiplatform.googleapis.com) — same model, same behavior. Eight-turn smoke passes cleanly, then a single user turn drops silent and the model is “lost” until the burst window passes.

+1 to the asks:

  1. Empty usageMetadata + immediate generationComplete is undocumented and indistinguishable from a legitimate “model produced nothing” — it breaks our retry/fallback logic and corrupts our SLO graphs (“successful turn, 0 tokens, 0 audio”).
  2. A mapping between this wire pattern and what surfaces on the dashboard (409 Conflict / no-quota-issue) would help — currently the failures are invisible at the API metrics layer.
  3. An explicit error event (or even a non-empty usageMetadata flag) would let clients distinguish a protocol-level silent drop from a legitimate empty output and trigger the right handling (text fallback vs. retry vs. escalate).

Happy to share full wire dumps from a failing window if a Google engineer wants to triage.