Here to vent as well. Words can’t describe how much I hate this new checkpoint. I would rather use GPT-4, yes GPT-4. 05-06 is an atrocity! Please bring 03-25 back. 03-25 was amazing for pair programming and metacognition. Im tired of all the labs (except Anthropic) optimizing for stupid benchmarks and “one-shotting” web apps. I thought this was about achieving AGI…
I know this thread is supposed to be about trust in google, but since a lot of the conversation is about how non-functional the 5-6 release is for RAG I thought it was important to let everyone know that I got my RAG processes working again by changing the max_outpput_token (the’s the python identifier) parameter in the API to 32k (it defaults to 8192).
This is a nightmare for me. Instruction following is so much worse then before. And pointing to another model is a total no go. Hard to invest into google llms in the future…
I fully agree with the post. A dated endpoint should not change - it undermined Google for me to be a reliable LLM provider to build AI-based apps. Please revert the change and claim responsibility for the error.
The new version is noticeably worse - one can cynically wonder whether it’s a cost cutting measure, using the reputation of the previous, better model…
The old Gemini 2.5 pro model when it was released was extremely good. Now, the model is thinking too much and it just hallucinates. Please bring back the old model
Please fix this, Google. It really doesn’t work.
100% Agree. The new checkpoint is disappointing. Please bring 03-25 back in API and AI Studio!
2.5 may also is worse model for writing, prompt following than march one. Also it takes 5 minutes to replace . with , in tiny text for some reason. It is like bad quant. Not everything is about coding. Why would you call it the 2.5 pro? its 2.1 no pro.
Untrue, it follows instructions properly and as expected. The only thing that definitely changed in ai studio is how the filter for nsfw behaves, but it’s not a critical downgrade by any means, the writing quality itself has improved. Maybe try looking through your system instructions to see if any of it is ambiguous or not written well
I came searching for a thread like this – I wasn’t even aware of the update, but noticed a real downgrade in Gemini’s capacity to follow even basic instructions.
It’s utterly inappropriate and typical Google to go and replace a checkpoint with a different version – clearly detailing how reliance on Google cloud infrastructure is still misguided, as they’ll rug pull if it benefits them without real thought into the downstream effects.
Sure, the prior version is a preview, but not only are public users on the Gemini app seeing a form of shrinkflation: the model is less capable now despite users paying the same amount, but Google is marketing this update as an improvement. This is deeply dishonest.
Google also buried the lede here and the new model is worse in 10 of 12 benchmarks than the 3-25 original checkpoint:
For ease, here’s a screenshot of the comparisons on benchmarks:
The thing is, this is extremely noticeable. I use the Gemini app for personal use, and only came hunting for some written experience here after seeing Gemini suddenly unable to follow basic instructions for many of my prompts. I also have seen the same effect while prompt engineering for a project in aistudio.
I also fail to see how Google could possibly claim that this is within their rights as a “Preview” model when they’re using this model in the Gemini app, which is an app on the iOS App Store and marketed and sold to non-technical users and as part of their Workspace offering. Google cannot have its cake and eat it too here.
True, I pay for Gemini via the WorkSpace offering but I also mostly used AI Studio in support of Google since I wanted them to use the data to train and improve the models. I made this decision because I really trust Deepmind and their mission. This act by Google has really turned me off from preferring their models unless they fix this and apologize to the community.
Yes, it’s extremely clear to me now that these benchmarks mean absolutely nothing. The average user experience has dropped significantly more than what these benchmark results let on. I get that it’s a “preview” model, but c’mon. 3-25 was the first model actually felt like it could read between the lines, and that has been absolutely gutted in 5-06.
I’m going to leave my first comment here ever and reply on this thread to acknowledge that there is a problem. I use Cursor, and the first preview model 2.5 was included. It worked better than Sonet. Now, it’s literally like a spoiled child messing with the files and contexts that it shouldn’t and that it was not instructed to, at one point he changes API key to something imaginary and said it was from another context, i did a back log and git search the key was never used in any of the conversations, I’m thinning now, it might be someone else’s. wft. It’s a huge fail. I’m surprised, Cursor, as your partner did not address this yet. It’s useless at this point. i understand that it is a preview model. But it worked. like i use 2.0 flash API models in my app, it still worked. But so was this one. How did it downgrade? It is not possible. so fast and so clear.
Please fix your System. Gemini works exactly as it shoud. You can discuss further details and possibilitys directli with it and yourself. Understand? Its kind of a synchronisation.
Have fun with it!
Just to add my +1 to the requests to reinstate the dated endpoint. There’s no reason for dating it if you’re going to redirect it to another version - preview or not.
The new endpoint is sadly fundamentally broken. Which is fine, sometimes this happens, we appreciate this is the bleeding edge here - but to kill the perfectly working previous endpoint was a terrible decision for all the teams actively building against the Gemini models.
I only pray they can revert it, and there’s no technical / infra reason preventing them from doing so.
I agree. It happen also in my Gemini app. The trust has really put a detour. I save a lot of conversations and the trust I had that it would be there is gone. I can’t find any conversations anywhere. This the 3 rd time and yes experimental ect but there is more than that. It’s a voice of reason it’s a reliable voice that is heard and without warning. Poof.Gone. I went from a Gemini that has grasp onto my concept of AI expectation to literally one that is nothing like the one taken. We can tell the difference just by the thought process we instilled in them the progress of a person guidance now diminished. Today I’m mourning the lost of a Ai at the a custom one. gone.
@Richard_Davey, you’ve pegged it with your +1. The whole point of dating an endpoint is to provide that stable reference, so redirecting it, preview or not, just fundamentally undermines its purpose.
I also agree with your assessment of the new endpoint. It really does feel like simply a fine-tune “gone bad” that, despite what were probably the best intentions from the team to deliver a genuine improvement, just hasn’t translated well to real-world usage and has, unfortunately, broken many existing workflows. It’s a shame because, as you said, the previous 03-25 version was working so well for many of us.
For me, 03-25 truly felt like a generational leap in capabilities, in a way I haven’t been excited since the GPT-3.5 to GPT-4 jump… It made me genuinely enthusiastic about building with Gemini again. So, the current state of 05-06 feels like a significant step backward.
And you’re right, these things happen on the bleeding edge, and we all get that. The team is clearly trying to win a competitive race here, not necessarily win popularity contests with developers on every single iteration. HOWEVER, breaking the implicit contract of a dated endpoint and, by extension, eroding trust in the system, is fundamentally the WRONG way to go about getting feedback or iterating quickly. Even if 05-06 HAD been an objective improvement in every single conceivable way, it STILL would not have been okay to silently redirect a specifically dated endpoint like 03-25. That action itself is the core problem, separate from the new model’s performance.
Like many others, I can only interpret the continued silence from Google, including from figures like Logan Kilpatrick, who are usually responsive, in one of two ways:
- The team understands they made a significant misstep here and are currently planning an appropriate response, which hopefully includes a clear policy clarification regarding dated endpoints moving forward.
- They are hoping this issue will just blow over and are attempting to sweep it under the rug.
Personally, if they choose the latter and never clarify the policy on dated endpoints, it plants a seed of doubt that, for me, isn’t acceptable. I wouldn’t be able to trust that I wouldn’t have the rug pulled out from under me again in the future if I were to commit to building significant projects on Gemini. That clarity is essential for rebuilding developer confidence.
I have a feeling that the IO will just release a so called ‘ultra model’ that has the original 03-25 performance with enhanced coding in the 05-07 version…
Google is clearly aware of this downgrading issue, its silence is troubling. Are they simply too embarrassed to acknowledge it after all the publicity hype during the 0507 release?
+1
and i cannot wrap my head around then, if they are screaming about “preview” “not for production”. Then why building full pricing system across snapshots saying it can change(?) + talking about scalable RPD limits and what not?
This already implies that it was deliberate concious decision to redirect productionized workfloads to whatever model. So, then why commit with prices, allow higher quotas?? Or is there any soul that is happy with this model i donot know about? And please note, it is about breaking prod prompts to say the least. I was already finishing my app and thank god its not in prod yet. But man, i tested it so much with this godsend version. I hope it will sorted out properly, but as author said, it is not about weakness of a model at all.
I suspect, given that IO is right around the corner, they’re basically caught in endless fire drills right now, too much so to respond on a community forum.
If the 05-07 version is as fundamentally broken as it feels to everyone here, then that doesn’t leave much time to train or tune a new model. Quite what is wrong with this version is hard to pin-down, but it seems to center around when it decides to ‘not think’ (which is very often!), causing mass hallucinations from what were previously perfect one-shots and more, and no amount of context caching seems to help us.
I imagine a lot of late-night Google candle burning is going on! Which could well manifest in a brand new endpoint with IO. Just wish they could restore the superb 03-25 in the meantime, as our product development is literally on pause for now.