We were running gemini 2.5 native dialog in production and we noticed past week that it randomly started being cut off or interrupted (not generating complete audio response).
Returned output transcription returns sentence with 10 words, but returned audio say only 1.5 - 2.5 of words and its cut off and than it stays silente for a few moments.
On which platform:
on ai studio and live api
When it happens:
responseTokenCount module of 50:
We noticed that if message.usageMetadata.responseTokenCount is module of 50; responseTokenCount % 50 === 0 the audio generated by assistant for that specific turn is not going to be complete
at what time:
we noticed that it can happen at the begging of first response, in the middle of conversation or when the model gets interuppted often
We tried:
muting incoming audio:
in 1/5 times the intial greeting was cut off after assistant said 2 of 10 words it needed to say
changing temperature:
we changed temperature between 0.6 to 1 (no difference in occurance)
changing voices
different combinations:
different combinations of using and not using affective dialog, proactivity, speech sensitivity, VAD
Assumptions:
during peek hours it happens more often
this started happening around 17. June
Closing thoughts:
who should we contact in regards to this, since it affects our product badly
we could provide in depth analysis and log of messages (events) sent to us from Live API when we were testing this
For me it feels quite random, cause it can happen in the 2nd answer in the 8th or whenever, but one thing you mentioned seems interesting to that it happens more offen when interrupted. For me it was actually changing in frequency based on the topic, but maybe just coincidence.
Hey it’s interesting that you mentioned June 17th, Google released a bunch of Gemini 2.5 models to public GA, including this model or a variant of this model on Vertex AI. We also noticed that the model is now worse than it used to be and it’s not consistent at all in its performance, I hope this gets addressed soon.
@chunduriv@Logan_Kilpatrick hasnt anyone at google noticed that native audio is causing errors? How come that previews are not versioned with dates, just released. What is going on with googles standards
Thanks for the thread - I see the same problems with the model stopping. It seems to happen both on input and output. When it asks me a question, I respond and get no follow up. After a few “hello?” type of nudges, finally it will follow up on the response I gave so it wasn’t actually missed, just frozen.
On output side also, sometimes it just stops mid-way. With some nudging like “continue”, it may finally continue right where it left off, or sometimes restarts.
It’s generally pretty random where issues happen but affects almost all conversations I try at some point, so basically it’s not working. I have no issues with stopping with gemini-live-2.5-flash-preview so don’t think it is an issue with integration (I am using JS SDK from the browser with an ephemereal token) so I am continuing with it but the audio is so much better with nativeaudio so looking forward to it becoming usable. Note, my team mate is really pushing me to switch to OpenAI Realtime API which I’m currently resisting but may not be able to much longer
I am having issues with this as well. nice to know that at least ( or i hope its specific to just the flash model and somehow this is some cost saving measure.) For me, the issue I’m experiencing is sometimes Gemini will switch into transcription mode, and then for the rest of the conversation it’ll only get text transcripts, and I can’t get it to analyze raw audio. And then the other half of the time, I can get it to analyze the raw audio, and then it just switches into raw audio mode, and this is great because I need to analyze raw audio. And this happens in Google AI Studio too with the Pro model, so I’m assuming it’s not actually just with the Flash model. I don’t know, that’s just really frustrating because I just want one functionality. It switches so much.