Hi Google AI team,
I’m experiencing a significant latency increase after migrating models due to the deprecation of gemini-2.0-flash-exp.
Previous setup (working well):
- Model: gemini-2.0-flash-exp
- Use case: near-real-time audio / conversational interaction
- Average latency: ~500–900ms
- Performance: smooth and usable for real-time interaction
Current setup:
- Model: gemini-2.5-flash-native-audio-preview-12-2025
- Average latency: ~1400–1800ms
- Latency occurs consistently on every request
Environment:
- API/Product: Gemini API
- Region: South Asia (Pakistan)
- Client: Realtime / streaming usage
- Network RTT to Google endpoints: stable and low
- Issue reproducible across multiple sessions
Issue description:
After switching to the currently available Flash audio model, latency has increased by ~2x compared to the deprecated Flash 2.0 experimental model. This makes real-time voice interaction noticeably slower and impacts UX significantly.
Expected behavior:
Latency comparable to gemini-2.0-flash-exp (~sub-1s) for real-time audio use cases.
Actual behavior:
Consistent latency in the 1400–1800ms range, even with similar prompt sizes and audio input.
Additional notes:
- No change in client code or infrastructure besides the model name
- The regression appears model-specific rather than network-related
Could you please confirm:
- Whether this latency increase is expected for the new native audio Flash model?
- If there are optimizations planned to bring latency closer to the previous Flash 2.0 levels?
- Whether another recommended model exists for low latency real-time audio use cases?
Thanks for your support.