Severe Degradation in Gemini Flash 2.0 API Performance — Tool Use and Output Quality Affected

Hi everyone,

We’re currently experiencing major performance issues with the Gemini Flash 2.0 API, particularly affecting tool use reliability and output coherence in structured conversation flows.

We’re building a voice-based interview agent, and during a high-stakes filmed demo today, the agent failed to progress through its stages. Despite no code or infra changes on our end, the LLM stopped using tools reliably, which are critical for managing logic and storing responses. This resulted in conversation loops, hallucinations, and general inconsistency — all of which were not present in previous tests (two days prior).

We’ve since rerun our test suite and observed a clear degradation in quality, independent of any updates on our side. This aligns with past reports from the community suggesting instability around new model rollouts or internal infrastructure changes:

Reddit thread on Gemini 2.0 Flash failing evals

Google AI Forum discussion

A potential theory (unconfirmed) is that the recent free availability of Gemini 2.5 Pro may be triggering infra shifts or quantization on the Flash 2.0 side, degrading its output. If this is the case, we’d love to understand what’s happening behind the scenes and whether this is a temporary phase.

Would appreciate any clarity from the Gemini team or confirmation that this is being looked into — and if other developers are noticing similar issues, let’s compare notes.

Thanks !!

2 Likes

Hi @Victor_Billaud,

Welcome to the Google AI Forum! :confetti_ball: :confetti_ball:

Apologies for the delay in response.

We have made several improvements to tool calling in the 2.5 family of models over the last couple of months. I would highly encourage using 2.5 switching to 2.5 family..

Also do let me know if you are still running into issues?