That’s a great workaround! I assume that uses the audio billing though, which is more expensive? I also played with the transcript for incoming audio a little bit and it had quite a lot of accuracy issues. I’m not sure if the same thing would happen for output.
There are quite a few issues with this workaround:
- the model generates AUDIO tokens - which are considerably more expensive than TEXT tokens
turn_completeevent arrives after all AUDIO tokens are generated - for long sentences it may take 10 seconds and even longer- while you could use
generation_completeevent instead - this would only shorten the response time for the 1st user utterance, as model is not ready to process new input untilturn_completeevent is generated - model sometimes simply fails to generate ANY output - neither audio chunks nor output transcription - and just returns
turn_complete; this happens for me quite often when I restore context and use tools to return extracts from the documents (agentic RAG pipeline)
Bottom line - this all looks like one dirty hack rather than a clean API that gemini-live-2.5-flash model used to have.
That’s a great breakdown, thanks for sharing. So I guess we just keep using it and hope they don’t pull the model out from under our feet? Hahaha. I’m still holding out hope we will get some official response from Google about what to do here.
We are past the official deadline - and they pulled it from “models” page - so I wouldn’t hold my breath…
Yeah this is an unfortunate turn of events… One of the projects at my work relies on Gemini Live Text Out so we need a Google fix on this ASAP..
@Lalit_Kumar is there still no update on this?
Yup, it looks like it was just killed. @Lalit_Kumar @Liam_Carter can anyone help?
I’m going to grab a $35 support plan on Google Cloud and see if they can do anything to help.
You’re amazing for that thank you!! Please let us know what they say.
The website is also still showing the lite version of the model as being available as well, but it’s not working for be either. ![]()
Isn’t lite only for the mainline api, not live?
Yes, it’s not ideal, and I would also prefer to get plain text output directly. Hopefully they’ll find a proper solution soon.
That said, it hasn’t worked too badly for me so far. I instructed the LLM to output JSON, and up to now the responses have been valid. The real-time speed was acceptable as well.
Since the other models were just turned off, there isn’t much of an alternative at the moment, so this workaround will have to do. I’m not exactly happy about it, but for my purposes it works well enough until they provide a proper fix.
+1 on this issue… 2 days after the deprecation, I’m surprised more people aren’t complaining
Yeah, I added a workaround of using the REST version instead of real-time, but it’s about x3 slower, especially when handling function calls. I can’t take the hit of using audio tokens for output, so I need to figure out another way. If they don’t fix it, I’m thinking I will have to jump over to OpenAI.
It really seems like quite the miss. Even the updated model page says the the native audio model support text in and text out. ![]()
For anyone wondering, when the model stopped working, this is the message you got back.
models/gemini-live-2.5-flash-preview is not found for API version v1beta, or is not supported for bidiGenerateContent
I’m mostly posting that so hopefully anyone who Google-searches that error will find this thread.
+1 here. gemini-live-2.5-flash-preview was a reliable model to get text output with audio streaming input. The audio generated by the newer models is just not good enough, it’s inconsistent and the pronunciation for non-english languages is sub-par.
Using text output and Chirp3 text-to-speech is working well in our business use-cases, but for our business the instabilities of these API’s and model availability is making us reconsider Gemini.
yeah so my company just jumped ship to openai’s realtime api, they support audio in text out with 4o realtime. Not ideal but it works. Wasn’t expecting this bad of a supprt response from Google but it is what it is I suppose
I advise anyone else who’s facing this to jump ship even if its paid cause google won’t resolve this until Gemini 3 Live drops and even then I’m skeptical that it’ll work like 2.5 live.
I have a support thread running with the Google Cloud team and they are going to escalate to engineering to ask.
I added a REST api impl of the same model and my latency junped from 1.5 seconds on average to 7s.
I also learned that the Live API is the only one that supports the use of tools, like Google Search Grounding, and custom functions. That’s critical for my use case, since I’m building a Google Home style AI assistant for Home Assistmat. So I need it to know current information from search, but also control the users home with my custom function calls.
So I’m hopeful my support case will get somewhere and I can keep using Gemini. I switched from OpenAI about 6 months ago bc flash 2.5 cost less and preformed way better than the current realtime OpenAI, which I think is still 4o based.
Honestly I’m only using OpenAI now because i checked and there’s a new model called GPT-Realtime which is supposedly based on 5? It’s pretty solid tbh but the migration is a bit of a headache cause the systems are kinda different and function call arrangements are set up differently.
Overall worth it tho
Luckily, when I ported to Gemini, I made an abstraction for the function call logic, so for me, switching that back should be quite straightforward. I’m going to prototype switching today.
The Google Cloud support person I have is very useful, and they sent a summary of the issue to the engineering team, so I’m hoping we might get something concrete back soon.