Inconsistent Response Behavior in gemini-2.5-flash-native-audio-preview-09-2025 Voicebot

Hi everyone,
I’m building a real-time Hebrew voicebot using the gemini-2.5-flash-native-audio-preview-09-2025 model, and I’m running into inconsistent behavior that I can’t fully explain.

The issue:
Sometimes the model simply doesn’t answer at all. The bot receives the audio input, but there’s no response from Gemini. After several attempts (sometimes 3–5 retries), it suddenly responds normally. Other times, the entire flow works perfectly from the first message, without any delays or failures.

What I’ve confirmed so far:
• The audio stream is being sent correctly
• The STT + request payload is valid
• No errors are returned from the API
• The problem is intermittent and unpredictable
• When it works, it works flawlessly

What I’m trying to understand:
• Is this a known issue with the current preview model?
• Are there recommended settings, timeouts, or event-handling mechanisms to improve stability?
• Could this be related to rate limits, streaming configuration, or model warm-up behavior?
• Is there any diagnostic logging I should enable to better understand the silent failures?

8 Likes

Hi!

I’m totally backing this, we’re facing the same issue on our own telephony integration and it’s getting really frustrating in production use.

On top of that, I’d like to add that this inconsistency extends to tool use as well, i.e.:

  • The model sometimes doesn’t execute a tool when it should (on average 7/10 a tool call is executed and 3/10 it’s ignored);
  • The model acknowledges the need of invoking a tool to the user, as described in the system instructions (e.g., “I’ll looking into that, give me a brief moment please…”, etc.), but doesn’t use any tool afterwards, it only remains silent until the user speaks again to trigger inference from the model;
  • The model made correctly a tool call, but remains silent and doesn’t communicate any result to the user until the user speaks again to trigger inference from the model (we have correctly verified that we’re sending tool call results to the model).

(Please note that all of the expected behavior described above, which the model doesn’t follow, is thoroughly and clearly described in the system instructions)

Lastly, we have tried playing with the temperatures and proactive dialog settings but this inconsistency is still there and clearly an important issue.

Thanks to Google for paying attention to this matter.

Cheers

2 Likes

Hi,

I’m backing this too, we are currently using the gemini-2.5-flash-live-preview with certain stability in production, while our tests with the native audio model have a lot of latency and voice generation issues.

Unfortunately, this model will be deprecated on December 9, and we will be forced to use the native model that does not have the quality expected. And, as i far as I know there won’t be any extension of the gemini-2.5-flash-live-preview, and there hasn’t been any updates to the native-09-preview model.

1 Like

Hi!

We’ve been seeing the same behavior on our side as well. In our case, the issues became much more stable after disabling Gemini’s built-in VAD and switching to a manual VAD pipeline, and then making sure we explicitly mark start_activity and end_activity boundaries with correct timing. Without tight control of activity windows, we noticed the model would either stay silent or fail to resume properly after long user turns.

On a related note, I’ll leave here a post + repo where I documented the limitations we found around long live sessions, context growth, and how Gemini behaves when the session gets too long (including cases where responses stop arriving or the model interrupts itself). It might help others facing the same symptoms until the preview model becomes more stable.