I’m evaluating the Gemini Live API (via Vertex AI or Gemini API) for a fictional MVP prototype: real-time voice conversations (speech-to-speech) with Gemini.
Scenario details:
- 100 simultaneous/ concurrent users
- Each user has about 1 hour of active conversation per day
- Total estimated: ~180,000 active conversation minutes per month (assuming 30 days)
- Using VAD (voice activity detection) so only speaking time is billed (no silence)
- Likely using models like gemini-2.5-flash-live or similar for low-latency voice
Specific questions:
- What is the effective per-minute cost for a full duplex voice conversation (input audio + output audio + processing)? Is it around $0.011–0.012/min as some docs/calculations suggest, or has it changed?
- What are the current concurrency limits for Gemini Live API? (e.g., max simultaneous WebSocket connections per project/region — Tier 1/2/3?)
- Can it handle 100 concurrent live sessions reliably?
- Any extra charges or setup needed for higher concurrency?
- How is billing calculated exactly for Live API? (tokens per second of audio? Input + output separately? Any flat fees?)
- Are there any preview limitations, regional restrictions, or best practices for scaling voice agents to this level?
- Any recommendations for partners/integrations (like Daily co) for web/mobile voice frontend?
This is for early prototyping/MVP budgeting — any guidance, updated pricing sheet, or quota increase path would be super helpful.