We were running gemini 2.5 native dialog in production and we noticed past week that it randomly started being cut off or interrupted (not generating complete audio response).
Returned output transcription returns sentence with 10 words, but returned audio say only 1.5 - 2.5 of words and its cut off and than it stays silente for a few moments.
On which platform:
on ai studio and live api
When it happens:
responseTokenCount module of 50:
We noticed that if message.usageMetadata.responseTokenCount is module of 50; responseTokenCount % 50 === 0 the audio generated by assistant for that specific turn is not going to be complete
at what time:
we noticed that it can happen at the begging of first response, in the middle of conversation or when the model gets interuppted often
We tried:
muting incoming audio:
in 1/5 times the intial greeting was cut off after assistant said 2 of 10 words it needed to say
changing temperature:
we changed temperature between 0.6 to 1 (no difference in occurance)
changing voices
different combinations:
different combinations of using and not using affective dialog, proactivity, speech sensitivity, VAD
Assumptions:
during peek hours it happens more often
this started happening around 17. June
Closing thoughts:
who should we contact in regards to this, since it affects our product badly
we could provide in depth analysis and log of messages (events) sent to us from Live API when we were testing this
For me it feels quite random, cause it can happen in the 2nd answer in the 8th or whenever, but one thing you mentioned seems interesting to that it happens more offen when interrupted. For me it was actually changing in frequency based on the topic, but maybe just coincidence.
Hey it’s interesting that you mentioned June 17th, Google released a bunch of Gemini 2.5 models to public GA, including this model or a variant of this model on Vertex AI. We also noticed that the model is now worse than it used to be and it’s not consistent at all in its performance, I hope this gets addressed soon.
@chunduriv@Logan_Kilpatrick hasnt anyone at google noticed that native audio is causing errors? How come that previews are not versioned with dates, just released. What is going on with googles standards
Thanks for the thread - I see the same problems with the model stopping. It seems to happen both on input and output. When it asks me a question, I respond and get no follow up. After a few “hello?” type of nudges, finally it will follow up on the response I gave so it wasn’t actually missed, just frozen.
On output side also, sometimes it just stops mid-way. With some nudging like “continue”, it may finally continue right where it left off, or sometimes restarts.
It’s generally pretty random where issues happen but affects almost all conversations I try at some point, so basically it’s not working. I have no issues with stopping with gemini-live-2.5-flash-preview so don’t think it is an issue with integration (I am using JS SDK from the browser with an ephemereal token) so I am continuing with it but the audio is so much better with nativeaudio so looking forward to it becoming usable. Note, my team mate is really pushing me to switch to OpenAI Realtime API which I’m currently resisting but may not be able to much longer
I am having issues with this as well. nice to know that at least ( or i hope its specific to just the flash model and somehow this is some cost saving measure.) For me, the issue I’m experiencing is sometimes Gemini will switch into transcription mode, and then for the rest of the conversation it’ll only get text transcripts, and I can’t get it to analyze raw audio. And then the other half of the time, I can get it to analyze the raw audio, and then it just switches into raw audio mode, and this is great because I need to analyze raw audio. And this happens in Google AI Studio too with the Pro model, so I’m assuming it’s not actually just with the Flash model. I don’t know, that’s just really frustrating because I just want one functionality. It switches so much.
I’ve been following the ongoing discussions about Gemini 2.5 Native Audio cutting off when the responseTokenCount reaches exactly 50 tokens
Just wanted to ask:
Has there been any official fix, patch, or recommended workaround for this issue?
I’m currently experiencing the same behavior: audio plays for just a few words and then cuts to silence when the token count hits a multiple of 50 — even though the model continues streaming text.
@chunduriv@Logan_Kilpatrick can we get any updates? 45 days and google still hasnt fixed this? How much longer is this model going to be unusable? Any info?
Logan responded to the Github issue on the genai python sdk that “the fix will come in a new model release” so I would just give up on this model for now.
Hi @Lalit_Kumar, could you please investigate this issue with your team since it’s persisting for the past 2 months (since the last model release), with no updates or responses regarding the fix since then? Thanks
We sent him multiple messages on X and multiple emails, no reply, I dont understand google breaking model on 17th June and not responding, notifiying anyone about it @Logan_Kilpatrick
With the release of gpt-realtime, I thought I’d give Gemini nativeaudio another try before potentially switching over to OpenAI, and it feels even worse than before. Audio quality itself with a direct browser websocket connection + ephemeral token is choppy which I don’t remember being an issue before, and it still has the stopping issues as well as never calling functions. From the other comments it sounds like we shouldn’t have much hope for this model - I’m a bit concerned if even the websocket connection itself is sustainable, as even with a few web workers, I’m worried it can achieve stable audio. Here’s hoping for a future model that works better and also can be accessed with webrtc and will give gemini another try then.