Generative AI on Vertex AI

I’m evaluating the Gemini Live API (via Vertex AI or Gemini API) for a fictional MVP prototype: real-time voice conversations (speech-to-speech) with Gemini.

Scenario details:

  • 100 simultaneous/ concurrent users
  • Each user has about 1 hour of active conversation per day
  • Total estimated: ~180,000 active conversation minutes per month (assuming 30 days)
  • Using VAD (voice activity detection) so only speaking time is billed (no silence)
  • Likely using models like gemini-2.5-flash-live or similar for low-latency voice

Specific questions:

  1. What is the effective per-minute cost for a full duplex voice conversation (input audio + output audio + processing)? Is it around $0.011–0.012/min as some docs/calculations suggest, or has it changed?
  2. What are the current concurrency limits for Gemini Live API? (e.g., max simultaneous WebSocket connections per project/region — Tier 1/2/3?)
    • Can it handle 100 concurrent live sessions reliably?
    • Any extra charges or setup needed for higher concurrency?
  3. How is billing calculated exactly for Live API? (tokens per second of audio? Input + output separately? Any flat fees?)
  4. Are there any preview limitations, regional restrictions, or best practices for scaling voice agents to this level?
  5. Any recommendations for partners/integrations (like Daily co) for web/mobile voice frontend?

This is for early prototyping/MVP budgeting — any guidance, updated pricing sheet, or quota increase path would be super helpful.