Severe Degradation in Gemini Flash 2.0 API Performance — Tool Use and Output Quality Affected

Hi everyone,

We’re currently experiencing major performance issues with the Gemini Flash 2.0 API, particularly affecting tool use reliability and output coherence in structured conversation flows.

We’re building a voice-based interview agent, and during a high-stakes filmed demo today, the agent failed to progress through its stages. Despite no code or infra changes on our end, the LLM stopped using tools reliably, which are critical for managing logic and storing responses. This resulted in conversation loops, hallucinations, and general inconsistency — all of which were not present in previous tests (two days prior).

We’ve since rerun our test suite and observed a clear degradation in quality, independent of any updates on our side. This aligns with past reports from the community suggesting instability around new model rollouts or internal infrastructure changes:

Reddit thread on Gemini 2.0 Flash failing evals

Google AI Forum discussion

A potential theory (unconfirmed) is that the recent free availability of Gemini 2.5 Pro may be triggering infra shifts or quantization on the Flash 2.0 side, degrading its output. If this is the case, we’d love to understand what’s happening behind the scenes and whether this is a temporary phase.

Would appreciate any clarity from the Gemini team or confirmation that this is being looked into — and if other developers are noticing similar issues, let’s compare notes.

Thanks !!

1 Like