We were using the gemini version 04-17 … and we had fine tuned the system prompt such that it solved most if not all the use cases that we had… when it was deprecated on 15th of July and we switched to the latest version… the model failed every single one of the use cases, i assumed that it had something to do with the hyper parameters. made them default and then a few use cases worked, so we tuned them further… and the variation in model quality between morning time and afternoon, afternoon and evening, evening and night, night and midnight was like the difference between perfect execution and being an absolute idiot and failing everything.. we tried both aistudio and vertex ai and have run into the same situation… What went wrong…? … we thought something in our code… so we tested one agent that we had , that was absolutely stable… it worked with gemini 2.0 flash and failed with gemini 2.5 flash… (PS. we use 2.5 flash with thinking disabled)… which as earlier stated, worked successfully and now fails miserably
Hello,
Welcome to the Forum,
For consistancy in model output, I would recommned setting temperature as “0” and trying again. Although you might have to adjust your prompt a little for better performance.
Tried that as well, get responses where it starts with text and goes long lines of whitespace, or long infinite ------, and the performance is even worse
For better response from the model, I would recommend fine tuning your system prompt a bit more. Try to instruct model step by step, being precise will help. If possible try setting some rules for model to follow, and giving an example is always helpful.
we already have an agentic design principle based system prompt… we have fine tuned, iterated and perfected over an older version of 2.5 flash and worked perfectly fine.. and then now it has suddenly stopped and fails all evals, we switched to openai gpt 4.1 small bcs of the vast fluctuation in quality… we have had cases where in the evening it works, in the night it does not, in the afternoon it sometimes works and sometimes does not, etc… but as a general thing if i ran the same query a 100 times 10 times it works, 90 times it does not
I can try to reproduce your issue, if you could share part of your code and your prompt.
I’m also experiencing the same behavior. I’ve noted the models performance degrade over time.
I even had a case where some information provided in the prompt was ignored.