Summary
Vertex AI Live API (bidiGenerateContent) is not functioning for our project on any of the available Live models. We have a working voice agent that broke between 2026-04-28 evening and 2026-04-29 morning.
Project details
Region tested: us-central1 (also us-east5, us-east1, us-west1, europe-west1, europe-west4, asia-northeast1 — same behavior)
- Service Account:
vertex-express@selma-voice-prod.iam.gserviceaccount.com - IAM role:
roles/aiplatform.user(Vertex AI User) — granted today, verified to take effect (text-onlygemini-2.5-flash:generateContentreturns 200 OK in this project) - Vertex AI API: Enabled
- Billing: Active (paid, since 2026-04-26)
- SDK:
google-genaiPython1.73.1and Node.js1.50.1 - API version:
v1beta1
Use case
Real-time voice secretary that talks to users in Russian/Ukrainian over WebSocket via client.aio.live.connect(). On 2026-04-28 it was working with gemini-2.0-flash-live-001 end-to-end with ~1.4 s TTFB. By 2026-04-29 morning it stopped — different symptom on different models.
Issue 1 — gemini-2.0-flash-live-001 returns 1008 not-found
client.aio.live.connect(model='gemini-2.0-flash-live-001', config={...})
→ APIError 1008: Publisher Model 'projects/selma-voice-prod/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-001' was not found
Reproduced in 7 regions with identical message. Same for gemini-2.0-flash-live-preview-04-09, gemini-2.0-flash-live, gemini-2.0-flash-live-preview. Text-only gemini-2.5-flash:generateContent works (200 OK), so this is specific to Live API access for 2.0 models.
Issue 2 — gemini-live-2.5-flash-native-audio connects but produces 0 server output
This is the only Live API model that successfully establishes a session for our project.
Reproduction config:
{
"response_modalities": ["AUDIO"],
"system_instruction": "You are a helpful assistant. Reply briefly in Russian.",
"input_audio_transcription": {},
"output_audio_transcription": {},
"speech_config": {
"voice_config": {"prebuilt_voice_config": {"voice_name": "Aoede"}},
"language_code": "ru-RU"
}
}
Observed:
- WebSocket connect:
~200 ms setupCompletereceived:
- Client → server PCM 16-bit 16 kHz, 30 ms frames,
audio/pcm;rate=16000, 121 chunks, 116976 bytes (3.6 s of Russian speech):
- Server → client: 0
modelTurn, 0inputTranscription, 0outputTranscription, 0turnComplete, 0generationCompletefor ≥30 s before timeout/disconnect.
Reproduced in 5+ standalone Python smoke runs today on Vertex (SA auth) and AI Studio (api_key) — same silent behavior. Tried with/without inputAudioTranscription, with/without speechConfig.languageCode, with/without prebuiltVoiceConfig, with/without speech audio (silent vs active). All produce zero serverContent.
Issue 3 — gemini-2.5-flash on Live → “is not supported”
client.aio.live.connect(model='gemini-2.5-flash', ...)
→ APIError 1007: gemini-2.5-flash is not supported in the live api
Confirms there is no working Live API model in our project — only gemini-live-2.5-flash-native-audio, and it does not produce server output.
What we need
-
Confirm whether
gemini-2.0-flash-live-001is still GA for new projects, and if so — restore access forselma-voice-prod. If globally deprecated, please publish that and recommend a stable Live replacement that supports asynchronous tool calling. -
Investigate why
gemini-live-2.5-flash-native-audioyields zero serverContent in our project despite a clean WebSocket setup and steady client → server PCM stream. -
Recommend a stable, GA Live API model on Vertex AI for production with Russian/Ukrainian audio and asynchronous tool calling. SDKs we use:
google-genaiPython 1.73.1, Node.js 1.50.1.
Happy to share the standalone smoke script (smoke_vertex.py), the test PCM audio, and full session logs on request.
Thanks,
Igor Sokhinov