[Live API] gemini-live-2.5-flash-native-audio returns no output after setupComplete; gemini-2.0-flash-live-001 not accessible

Summary

Vertex AI Live API (bidiGenerateContent) is not functioning for our project on any of the available Live models. We have a working voice agent that broke between 2026-04-28 evening and 2026-04-29 morning.

Project details

Region tested: us-central1 (also us-east5, us-east1, us-west1, europe-west1, europe-west4, asia-northeast1 — same behavior)

  • Service Account: vertex-express@selma-voice-prod.iam.gserviceaccount.com
  • IAM role: roles/aiplatform.user (Vertex AI User) — granted today, verified to take effect (text-only gemini-2.5-flash:generateContent returns 200 OK in this project)
  • Vertex AI API: Enabled
  • Billing: Active (paid, since 2026-04-26)
  • SDK: google-genai Python 1.73.1 and Node.js 1.50.1
  • API version: v1beta1

Use case

Real-time voice secretary that talks to users in Russian/Ukrainian over WebSocket via client.aio.live.connect(). On 2026-04-28 it was working with gemini-2.0-flash-live-001 end-to-end with ~1.4 s TTFB. By 2026-04-29 morning it stopped — different symptom on different models.

Issue 1 — gemini-2.0-flash-live-001 returns 1008 not-found

client.aio.live.connect(model='gemini-2.0-flash-live-001', config={...})
→ APIError 1008: Publisher Model 'projects/selma-voice-prod/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-001' was not found

Reproduced in 7 regions with identical message. Same for gemini-2.0-flash-live-preview-04-09, gemini-2.0-flash-live, gemini-2.0-flash-live-preview. Text-only gemini-2.5-flash:generateContent works (200 OK), so this is specific to Live API access for 2.0 models.

Issue 2 — gemini-live-2.5-flash-native-audio connects but produces 0 server output

This is the only Live API model that successfully establishes a session for our project.

Reproduction config:

{
  "response_modalities": ["AUDIO"],
  "system_instruction": "You are a helpful assistant. Reply briefly in Russian.",
  "input_audio_transcription": {},
  "output_audio_transcription": {},
  "speech_config": {
    "voice_config": {"prebuilt_voice_config": {"voice_name": "Aoede"}},
    "language_code": "ru-RU"
  }
}

Observed:

  • WebSocket connect: :white_check_mark: ~200 ms
  • setupComplete received: :white_check_mark:
  • Client → server PCM 16-bit 16 kHz, 30 ms frames, audio/pcm;rate=16000, 121 chunks, 116976 bytes (3.6 s of Russian speech): :white_check_mark:
  • Server → client: 0 modelTurn, 0 inputTranscription, 0 outputTranscription, 0 turnComplete, 0 generationComplete for ≥30 s before timeout/disconnect.

Reproduced in 5+ standalone Python smoke runs today on Vertex (SA auth) and AI Studio (api_key) — same silent behavior. Tried with/without inputAudioTranscription, with/without speechConfig.languageCode, with/without prebuiltVoiceConfig, with/without speech audio (silent vs active). All produce zero serverContent.

Issue 3 — gemini-2.5-flash on Live → “is not supported”

client.aio.live.connect(model='gemini-2.5-flash', ...)
→ APIError 1007: gemini-2.5-flash is not supported in the live api

Confirms there is no working Live API model in our project — only gemini-live-2.5-flash-native-audio, and it does not produce server output.

What we need

  1. Confirm whether gemini-2.0-flash-live-001 is still GA for new projects, and if so — restore access for selma-voice-prod. If globally deprecated, please publish that and recommend a stable Live replacement that supports asynchronous tool calling.

  2. Investigate why gemini-live-2.5-flash-native-audio yields zero serverContent in our project despite a clean WebSocket setup and steady client → server PCM stream.

  3. Recommend a stable, GA Live API model on Vertex AI for production with Russian/Ukrainian audio and asynchronous tool calling. SDKs we use: google-genai Python 1.73.1, Node.js 1.50.1.

Happy to share the standalone smoke script (smoke_vertex.py), the test PCM audio, and full session logs on request.

Thanks,
Igor Sokhinov