Latency regression after deprecation of gemini-2.0-flash-exp (500ms → 1800ms)

Hi Google AI team,

I’m experiencing a significant latency increase after migrating models due to the deprecation of gemini-2.0-flash-exp.

Previous setup (working well):

  • Model: gemini-2.0-flash-exp
  • Use case: near-real-time audio / conversational interaction
  • Average latency: ~500–900ms
  • Performance: smooth and usable for real-time interaction

Current setup:

  • Model: gemini-2.5-flash-native-audio-preview-12-2025
  • Average latency: ~1400–1800ms
  • Latency occurs consistently on every request

Environment:

  • API/Product: Gemini API
  • Region: South Asia (Pakistan)
  • Client: Realtime / streaming usage
  • Network RTT to Google endpoints: stable and low
  • Issue reproducible across multiple sessions

Issue description:
After switching to the currently available Flash audio model, latency has increased by ~2x compared to the deprecated Flash 2.0 experimental model. This makes real-time voice interaction noticeably slower and impacts UX significantly.

Expected behavior:
Latency comparable to gemini-2.0-flash-exp (~sub-1s) for real-time audio use cases.

Actual behavior:
Consistent latency in the 1400–1800ms range, even with similar prompt sizes and audio input.

Additional notes:

  • No change in client code or infrastructure besides the model name
  • The regression appears model-specific rather than network-related

Could you please confirm:

  1. Whether this latency increase is expected for the new native audio Flash model?
  2. If there are optimizations planned to bring latency closer to the previous Flash 2.0 levels?
  3. Whether another recommended model exists for low latency real-time audio use cases?

Thanks for your support.

Same here, gemini-2.0-flash-exp is amazing… Can we have a model like that that doesn’t support tools or whatever additional capabilities you are adding in for audio etc. Give us a smaller LLM just for highly optimized STT. gemini-2.0-flash-live-preview-04-09 is no where near as good as gemini-2.0-flash-exp and I assume 2.5 Live is worse and thus the push to do S2S. It’s not what we need or want really….

1 Like