Gemini 2.5 Flash Quality Degradation based on internal Evals

Shreyas · July 18, 2025, 1:28pm

We were using the gemini version 04-17 … and we had fine tuned the system prompt such that it solved most if not all the use cases that we had… when it was deprecated on 15th of July and we switched to the latest version… the model failed every single one of the use cases, i assumed that it had something to do with the hyper parameters. made them default and then a few use cases worked, so we tuned them further… and the variation in model quality between morning time and afternoon, afternoon and evening, evening and night, night and midnight was like the difference between perfect execution and being an absolute idiot and failing everything.. we tried both aistudio and vertex ai and have run into the same situation… What went wrong…? … we thought something in our code… so we tested one agent that we had , that was absolutely stable… it worked with gemini 2.0 flash and failed with gemini 2.5 flash… (PS. we use 2.5 flash with thinking disabled)… which as earlier stated, worked successfully and now fails miserably

Lalit_Kumar · July 21, 2025, 7:18am

Hello,

Welcome to the Forum,

For consistancy in model output, I would recommned setting temperature as “0” and trying again. Although you might have to adjust your prompt a little for better performance.

Shreyas · July 21, 2025, 7:31am

Tried that as well, get responses where it starts with text and goes long lines of whitespace, or long infinite ------, and the performance is even worse

Lalit_Kumar · July 22, 2025, 9:38am

For better response from the model, I would recommend fine tuning your system prompt a bit more. Try to instruct model step by step, being precise will help. If possible try setting some rules for model to follow, and giving an example is always helpful.

Shreyas · July 22, 2025, 9:53am

we already have an agentic design principle based system prompt… we have fine tuned, iterated and perfected over an older version of 2.5 flash and worked perfectly fine.. and then now it has suddenly stopped and fails all evals, we switched to openai gpt 4.1 small bcs of the vast fluctuation in quality… we have had cases where in the evening it works, in the night it does not, in the afternoon it sometimes works and sometimes does not, etc… but as a general thing if i ran the same query a 100 times 10 times it works, 90 times it does not

Lalit_Kumar · July 28, 2025, 11:03am

I can try to reproduce your issue, if you could share part of your code and your prompt.

mnsh · August 9, 2025, 4:41am

I’m also experiencing the same behavior. I’ve noted the models performance degrade over time.

I even had a case where some information provided in the prompt was ignored.

Topic		Replies	Views
Gemini 2.0 Flash very erratic behavior Gemini API vertexai , gemini-20	5	175	August 9, 2025
Severe Degradation in Gemini Flash 2.0 API Performance — Tool Use and Output Quality Affected Gemini API model-quality	1	414	August 7, 2025
Gemini-1.5-pro-latest performs WORSE since yesterday. How to use its previous version? Gemini API	35	987	September 2, 2024
Gemini 2.5 Pro has gotten worse Google AI Studio models , model , gemini-2-5	15	1718	July 24, 2025
Gemini Flash Model Ignoring JSON Schema in Prompts Gemini API gemini-15 , api , models , gemini-api	2	337	November 21, 2024

Gemini 2.5 Flash Quality Degradation based on internal Evals

Related topics