Same here @DaveFL, it appears they removed it.
I ran Aider benchmarks on Rust yesterday comparing exp to preview and was disappointed to say the least. This morning on a tip I just ran the same set of benchmarks using a temperature of 0.5 and the differences are quite large. Preview shows to be performing very well with this adjustment.
I hadn’t noticed a change until yesterday using https://gemini.google.com/app but the switch finally happened and the difference in model output and general ability is drastic and noticeable. I hadn’t had a single hallucination with 03-25 and its ability to generate functional code was top-of-the-line.
The latest version is abysmal compared to 03-25 and absolutely unusable. I attempted to get it to write a simple Dockerfile to install Tailscale and establish a TCP connection to another Docker container and it entirely failed at this. I did not realize the model had changed (so this was a completely blinded test) and it could not make a working version of this file to save its life. It was riddled with hallucinations and did not bother to look through my actual codebase.
At one point, it suggested that I go back and retry a bash script that it generated that I assured it multiple times did not work. I wholeheartedly believe 2.5 Pro 03-25 is revolutionary and will completely change the way SWE and coding teams operate, but this new version is not the same at all.
@Dylan_Rollins Have you tried using AI Studio or the API and setting the temperature ?
I am finding that 0.6 works best with 05-06 and 0.5 with 03-25 but 0.6 also works well with it.
“Modified by moderator” @Logan_Kilpatrick
Today’s IO is a complete disappointment. No actual groundbreaking model is released apart from a 2.5 flash, and a ‘coming soon’ deep thinking model.
Where is the original 03-25? Google is clearly aware of the regression “Modified by moderator”
At least OpenAI has the gut to admit their update failures and rolled back to old 4o.
@Fr_L I get how it seems that way, but I think you’re starting from a mistaken premise. Sundar Pichai made it pretty clear at the beginning of the I/O keynote that they’ve pivoted how they handle events. Big model announcements and releases aren’t being saved for I/O anymore, they’re dropping throughout the year without notice, like the March 25 release of 2.5 Pro, in case you missed that.
The event is more about the surrounding integrations, use cases built around the foundational models, and updates to user-facing products, capability expansions, etc.
Veo 3 and Imagen 4 are pretty fricking amazing models! and the new Flash model is nothing to be sniffed at. But yeah, I hear you.
I think right now it’s perfectly clear that 03-25 is never coming back.
So it’s time to start building on the current Flash and Pro and wait for them to hit GA in June, by which time we can only hope that one of the 6 (IIRC) new models they’ve currently got cooking are ready to pop out the oven and shine in the same way that 03-25 did.
I will try it out with these parameters. Thanks for the suggestion!
We see that the models require relatively low T values (0.2-0.4) to even function (even if you would not be aiming for “low creativity” for a task). With Temp 1.0, the current Gem 2.5 Pro preview model often produces nonsense or artifacts that I haven’t seen much elsewhere.
We use 2.5 Pro with a temperature of zero for any coding-related task, or any task that needs to call tools, or we’ve found it just goes haywire more often than not. Flash, on the other hand, has no such issues.
Now Google has even eliminated CoT. Not a lick of thought put into recent modifications. Wonder if the decision process in the dev department has changed or something. All of these need to be rolled back.
I’m more curious about why… What do they gain from doing this?
You musn’t have started playing with native audio “Modified by moderator”
is vertex 3-25 still usable?
do you think they will keep 3-25 preview even when the prod of 2.5 pro will be released in the next weeks (and will still may be the worse 05-06?
Yes, for now, 03-25 (preview) is still usable on Vertex and is the real snapshot. Few people use it because of the barrier to entry. It performs amazingly well for me. As for whether it will stay up, who knows.
thank you!
and you are refering to preview version, not the exp (free) version, right?
The new model is atrocious!!! It consistently forgets what I files I uploaded (even though they are visible in the interaction window’s file tab) and will focus on previous prompts rather than the current. It barely functions. It is simply incredible you haven’t rolled back to the previous model while you work out the kinks. Stop being so prideful and listen to the user’s feedback!!!
Logan Kilpatrick has effectively admitted Google realizes 05-06 (“I/O edition”) is inferior to 03-25, but still won’t serve it. He also states that the new GA release coming up will “close the gaps” between 03-25 and 05-06. I can only take this to mean they expect it to still underperform 03-25, but this is what we can expect, and “take it or leave it.” Pretty disappointing.
Unless someone else sees this a different way.
“Modified by moderator”
God, I just… I really hate them. It feels like they’re just mocking us, but I still keep hoping things will actually get better. And the weirdest part is, everyone else seems totally fine with it – people are actually thrilled! There just aren’t enough of us. Our complaints won’t make a dent; people are still going to trust them no matter what. And us? We’re just a small bunch, probably more hassle for them than we’re worth, so… why am I even bothering to write this? I guess I just put too much faith in Google, and this whole thing has been a massive letdown and has really shaken me up.
The best possible interpretation of all this is that Google are just terrible, TERRIBLE communicators. Enterprise is where the real money is for AI, not free stuff. This stuff badly impacts enterprise users too. Google WILL listen to user feedback or they will lose a LOT of money.
Or maybe the enterprises like this, I don’t know.