2.5 flash audio native - output broken in DE

I’m using 2.5 flash preview native audio dialog in my app and it seemingly at random doesn’t finish the output.

The output will just abruptly end, because it didn’t finish generating the entire answer.
Now this can happen with long and short answers.

I was talking with the model when it happened and it told me that indeed it was a mistake on the model side and the generation abruptly ended. Of course I don’t know how reliable that answer is, but after troubleshooting for 2 days and seeing that all the chunks that I get also get output, this is the most likely reason at the moment.

While I was talking with Gemini about the issue, the issue itself was not happening, so I went back with it to the initial topic, which was historic information about a city, to reproduce it and indeed it happened again.

Are there any safety filters that might trigger this cross-language? Cause so far it happened only in german (testing in german and english)

2 Likes

I’ve seen it happen in English too. Also sometimes on the first turn it just fails with “Internal Server Error” as a websocket message - so I’d assume that’s issues with the preview release

Do you recall any circumstances that made it happen more often?

https://discuss.ai.google.dev/t/gemini-native-audio-api-usage-limits-tier-3-production-readiness/88181
Here it was said that “it’s production ready” - so I’m not so sure about the preview state

I was doing the exact same thing multiple times, so no idk what the difference would be. I would take a comment in a forum as gospel. AFAIK there has been no official announcement and the model ID still says “preview”

Hi @AIchievable @Martin_at_Cobbery,

Thank you for bringing this to our attention. We truly appreciate you flagging this issue, we will file a bug internally.

still same issue model api response stops after some time audio stop midway

Hi @saqlain_ahmed,

Could you please code snippet or prompt that you are trying to of what you’ve tried so far? that would also be verThanks!

Our current config uses response_modalities=[“AUDIO”] with voice_name=“puck”. The session is set up with activity_handling=NO_INTERRUPTION, proactive audio enabled, and no custom context window compression.

Could this behavior be related to context length limits (prompt accumulation), or is it more likely due to a session termination / audio stream cutoff on the server side? because it does continue when user again talk

config

CONFIG = types.LiveConnectConfig(
response_modalities=[“AUDIO”],
media_resolution=types.MediaResolution.MEDIA_RESOLUTION_LOW,
# context_window_compression=(
# # Configures compression with default parameters.
# types.ContextWindowCompressionConfig(
# trigger_tokens=28000,
# sliding_window=types.SlidingWindow(target_tokens=13774),
# )
# ),
speech_config=types.SpeechConfig(
language_code=“en-US”,
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name=“puck”)
)
),
realtime_input_config=types.RealtimeInputConfig(
automatic_activity_detection=types.AutomaticActivityDetection(disabled=False),

    activity_handling=types.ActivityHandling.NO_INTERRUPTION
),
input_audio_transcription=types.AudioTranscriptionConfig(),
output_audio_transcription=types.AudioTranscriptionConfig(),
system_instruction=SYSTEM_INSTRUCTION,


proactivity=types.ProactivityConfig(
   proactive_audio=True
)

)

I’m having the same problem. I’m getting terrible results from the new gemini-2.5-flash-native-audio-preview-09-2025 in general: breaks up mid-sentence, gets stuck, speaks nonsense and so on. I’m worried that they’re considering this generally available and deprecating the other models on Dec 9. Looks like a lot of Gemini-based applications will soon break :frowning: