Rewritten after more significant testing.
To start off Gemini 2.5 was incredible when it came to creative writing. Followed my instructions to the T, was smart when it came to knowing when to let a story marinate and when to push a plot. It handled multiple characters and complex scenes.
I couldn’t have asked for a better LLM and hope you don’t sunset it ever(please I beg, it is a gift to man kind.)I also used it a lot of coding and simple tips for using blender and it was top notch there.
Outside of a few “ism’s”, I had zero issues with 2.5.
First impression on 3.0 preview:
Doesn’t like to adhere to my personal thinking prompt the way 2.5 did. Also formatting issues, doesn’t follow prompted formatting.
But I’m begging you to let people turn built in reasoning off. Let us inject our own thinking formats without needing to use prompting trickery. Please.
Built in reasoning is not good for creative writing in my experience, more so with 3.0 than 2.5.
2.5 wrote incredibly when it followed my reasoning format, while skipping the built in stuff. I can’t stress this enough, my jaw was on the floor so often with 2.5 and how it rounded out a plot line while making call backs to the early story.
Built in reasoning is a huge hit to the writing, instruction following, formatting, and storytelling complexity. Also the ability to use assistant message as a final message in a prompt chain helped immensely for 2.5, please bring this back.
3.0 takes more coaxing to follow instruction.
Complexity in the storytelling is lacking. It feels very one dimensional, there are no layers, in both the characters and the plot. I feel like I’m trying to force a coding only model to tell a story.
Seeing some significant phrase repetition early into a chat. It uses horizon endings more.
Also not great at sticking with a characters traits in the story, softens out all the rough edges.
3.0 is less intelligent when it comes to progression, it’s like DeepSeek with constant need to push action and interruption. To compare 2.5 was nothing like that. It pushed at the right times, it let scenes chill without constant knock on door interruption.
It also self critiques in its writing which is not good, “She was wearing a grey jacket– no wait when did she put on a jacket? She was wearing a blue tank top.”
Now for the things I do like. The prose as always is great. It’s descriptive and really builds a scene.
Seems to have less of 2.5’s ‘isms’. Uses more action beats vs the 2.5 ‘He said, his voice xyz.’ as prompted.
Much like 2.5 manages multiple characters well.
The creativity so far has been nice but needs a slightly higher temperature Swipes also have variety still. Please don’t go the way of 1.0 max temp like the other large companies. The interplay between temp and top P to allow for more wildly creative responses is one of the best parts of using an LLM.
Overall, I will probably continue to use 2.5 for creative writing and wait for 3.0 to get some tuning.
2.5 wrote with more complexity, followed instructions better, allowed me to use my own format for thinking. It’s not a total miss. Just could use a little bit of tweaking, but in a blind test I pick 2.5 every time.
I hope you’ll consider what made 2.5 great in the creative writing space and apply that to 3.0. It’s a small niche I know, but it makes for more human like interaction.
I’ll update my post as I dive deeper into the model.
Other notes: As an assistant in AI studio, it’s performance is worse than 2.5. Assumes I don’t know what I’m talking about and that I’m wrong so it makes some needless changes.
Personality feels like 1.5. No ‘soul’.
Honestly, please consider going back to the way you did ‘exp’ versioning and testing. You iterated from a nightmare (0205) to a phenomenal model release (2.5) doing it that way.