Gemini-2.5-flash-native-audio-preview-09-2025 Text -> Text Only Not Working

Hello! I’m currently using gemini-live-2.5-flash-preview to power my website Homeway. I’m using dotnet, so I wrote my own WebSocket impl because I couldn’t find one in the official SDK. I’m going text → text chat completions, with tools, search grounding, etc. It’s all working well**,** and I have been very impressed with the latency.

I got an email saying that gemini-live-2.5-flash-preview was being replaced by "`gemini-2.5-flash-native-audio-preview-09-2025` But when I swap the model strings, I my websocket is closed after I send the config object with the error:

Cannot extract voices from a non-audio request

My config object is set up with the output to only be TEXT, and I don’t create any of the voice-related generation config subobjects.

The website says the audio-preview model supports text as input and output, so it seems like it should work. Do you have any idea what I need to do to fix this? Do I need to set up AUDIO as a possible output but never use it?

Thanks!

8 Likes

Hello,

You should still be able to use gemini-live-2.5-flash-preview model, could you please try this and see it works for you?

Hey! Yes, I can still use the model right now. But I got this email on the 10/14/25 saying:

What you need to know

The following two Gemini API models will be discontinued on December 9, 2025:

  • Gemini 2.0 Flash Live (gemini-2.0-live-001)

  • Gemini 2.5 Flash Live (gemini-live-2.5-flash-preview)

We have recently launched a new, updated preview version to replace the previuos ones: Gemini 2.5 Flash Native Audio Preview (September 2025 version) (gemini-2.5-flash-native-audio-preview-09-2025)

This new model provides significant improvements in function calling and speech quality.

What you need to do

To avoid service disruption, please upgrade to the new model, Gemini 2.5 Flash Native Audio Preview (gemini-2.5-flash-native-audio-preview-09-2025) before December 9, 2025.

That’s why I was looking to update the model.

Same issue here. And also got a similar email.
Staying with gemini-live-2.5-flash-preview for now - but December 9 is not too far…

1 Like

Yeah, it seems like an API problem. The model should be able to accept text in and text out as a valid config, but the API isn’t allowing it right now. But if they asked us to use the model and are deprecating the old one, it should be 100% supported.

1 Like

I’m having similar issues. Looks like there are issues across the board for the new model.

Also, I’m getting terrible results from the new gemini-2.5-flash-native-audio-preview-09-2025 in general: breaks up mid-sentence, gets stuck, speaks nonsense and so on. I’m worried that they’re considering this generally available.

1 Like

@Lalit_Kumar do you have any guidance for us?

Hello,

Apologies for the delay in response. Could you please share your minimum reproducible code? We would like to run and analyze it on our end so we can assist you better.

@Lalit_Kumar Sorry for the delay. Here’s a quick PY single-file demo I made that reproduces the issue.

https://homeway.io/gemini-bug-demo.py

All you need to do is add an API key in the main() function and then toggle between the working model I’m currently using, but is now deprecated, and the new model that is suggested to use, but it doesn’t work.

2 Likes

Any updates on this? We are pretty close to the EOL of gemini-live-2.5-flash

Same issue here. I’m using the WebSocket API with my own Kotlin implementation. It worked perfectly with gemini-2.0-live-001 (and still does), but simply switching the model name to gemini-2.5-flash-native-audio-preview-09-2025 doesn’t work when producing text output. I would really appreciate your support, I’m planning to conduct a study based on this implementation, and everything is already prepared. I just learned that the current model will be deprecated in three weeks, which really throws off all my plans.

1 Like

I have the same issue here… Any news from Google?

Hi folks,

I’m having the same issue. I’m using the gemini-live-2.5-flash-preview to input audio and output text, but it’s allowing me only Audio → Audio, any idea how to solve this?

There hasn’t been any responses yet and the dead line is the Dec 8th, which is 11 days from now. Any updates @Lalit_Kumar?

Also Running into the same issue with our usecase. Hoping, They atleast extend gemini-live-2.5-flash-preview availability until the issue with the native-audio model gets sorted.

Same. I’m also kind of hoping they drop Gemini 3.0 real-time before then, and I can switch to it. :smiley:

1 Like

@Pannaga_J sorry for the random ping, but we are running out of time. Do you have any thoughts of what we need to do? I have a one file Python demo of the issue linked above.

Sorry for reposting this, but several other developers and I are waiting for a response on an issue with a December 7th deadline due to the gemini-live-2.5-flash-preview model’s depreciation.

In the thread, I posted a one-file Python script that demos the issue, so it should be quite easy to understand. As far as I can tell, after the model is deprecated on December 7th, there will be no way to do a text → text inference for the real-time models.

Any guidance would be fantastic! I’m hoping we can find a workaround or maybe push the model depreciation until the issues are resolved.

Hey everyone,

I found a way to make it work, although it’s not fully satisfying, since you have to use “responseModalities”: [“AUDIO”].

To use the new model and still get text as output, you need to set:

“outputAudioTranscription”: {}

in the config. Here’s the full payload:

payload = {
“setup”: {
“model”: self._model,
“systemInstruction”: {
“parts”: [{
“text”: system_instruction
}]
},
“generationConfig”: {
“candidateCount”: 1,
“maxOutputTokens”: self._max_output_tokens,
“temperature”: self._temperature,
“topP”: self._top_p,
“responseModalities”: [“AUDIO”]
},
# This enables transcription of the model’s audio output.
“outputAudioTranscription”: {}
}
}

You can see this part in the api docs here.

Another thing I had to change is the URL for the API:

REALTIME_WSS_URL = (
“wss://generativelanguage.googleapis.com/ws/”
#“google.cloud.aiplatform.v1.LlmBidiService/BidiGenerateContent” this one did’t work for me
“google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent”
)

I would like to link it in the api docs, but I am only allowed to use two links…

I updated the script, and you can find it here.

I’m not entirely sure how the audio transcription process works internally, so take this fix with a grain of salt. I don’t know whether live performance is affected by enabling audio transcription or not.

Hopefully this helps you fix your issues too!
BR Michael

You can see the new WebSocket URL I used in the API docs here