Live API discontinuation for gemini-live-2.5-flash-preview — degraded behavior, higher hallucinations, and no clear replacement?

Hey folks,

We’ve been using gemini-live-2.5-flash-preview for quite some time in our production system, and after the Live API discontinuation notice, we tried moving to the newer native audio models. Unfortunately, the behavior is completely different —

  • We only need text-to-text modality (no audio input/output), but the new model doesn’t seem compatible with that setup.

  • The model hallucinates more, struggles to follow structured commands, and overall feels much less stable compared to the previous half-cascade flash-preview.

  • It also seems totally incompatible with the previous half-cascade pipeline — we can’t replicate the same latency, response style, or turn-taking behavior.

Has anyone here had access to Gemini Flash Private GA?
I’m curious whether:

  1. It behaves closer to 2.5-flash-preview in terms of response quality,

  2. It’s possible to get into the private GA program,

  3. Or if Google has shared any plans to bring back half-cascade-style models for text-only streaming.

Would really appreciate hearing from anyone who’s been in contact with the Gemini team or experimenting with the new versions. Right now, it feels like there’s no drop-in alternative for the old gemini-live-2.5-flash-preview if your system depends on fast, text-only streaming.

1 Like

Regarding the text-to-text input and output, I asked this a while back and didn’t get an answer.

Hopefully they give us something soon or when the model is deprecated I guess I will go back to OpenAi. I’m hoping maybe there’s a model update that will be announced soon, that will include a half cascade text to text model.

1 Like