Latency regression after deprecation of gemini-2.0-flash-exp (500ms → 1800ms)

Skylar_cedar · January 29, 2026, 4:12am

Hi Google AI team,

I’m experiencing a significant latency increase after migrating models due to the deprecation of gemini-2.0-flash-exp.

Previous setup (working well):

Model: gemini-2.0-flash-exp
Use case: near-real-time audio / conversational interaction
Average latency: ~500–900ms
Performance: smooth and usable for real-time interaction

Current setup:

Model: gemini-2.5-flash-native-audio-preview-12-2025
Average latency: ~1400–1800ms
Latency occurs consistently on every request

Environment:

API/Product: Gemini API
Region: South Asia (Pakistan)
Client: Realtime / streaming usage
Network RTT to Google endpoints: stable and low
Issue reproducible across multiple sessions

Issue description:
After switching to the currently available Flash audio model, latency has increased by ~2x compared to the deprecated Flash 2.0 experimental model. This makes real-time voice interaction noticeably slower and impacts UX significantly.

Expected behavior:
Latency comparable to gemini-2.0-flash-exp (~sub-1s) for real-time audio use cases.

Actual behavior:
Consistent latency in the 1400–1800ms range, even with similar prompt sizes and audio input.

Additional notes:

No change in client code or infrastructure besides the model name
The regression appears model-specific rather than network-related

Could you please confirm:

Whether this latency increase is expected for the new native audio Flash model?
If there are optimizations planned to bring latency closer to the previous Flash 2.0 levels?
Whether another recommended model exists for low latency real-time audio use cases?

Thanks for your support.

Seth_Ford · January 29, 2026, 9:31am

Same here, gemini-2.0-flash-exp is amazing… Can we have a model like that that doesn’t support tools or whatever additional capabilities you are adding in for audio etc. Give us a smaller LLM just for highly optimized STT. gemini-2.0-flash-live-preview-04-09 is no where near as good as gemini-2.0-flash-exp and I assume 2.5 Live is worse and thus the push to do S2S. It’s not what we need or want really….

Topic		Replies	Views
Gemini Live API models high Latency Gemini API api , models , gemini	11	863	December 11, 2025
Significant delay with Gemini Live 2.5 Flash (native audio) Gemini API models , gemini , audio , gemini-flash-2-5	0	237	February 12, 2026
Critical Regression in native-audio-preview & Deprecation Confusion for Dec 9, 2025 Gemini API api , live-streaming	6	422	March 30, 2026
Increased Latency in the Gemini 2.5 Flash API Gemini API gemini , gemini-flash	1	321	December 23, 2025
Gemini 2.5 Flash audio response latency doubled / tripled after 3.0 Pro release? Gemini API audio	4	403	December 17, 2025

Latency regression after deprecation of gemini-2.0-flash-exp (500ms → 1800ms)

Related topics