Gemini 3 Preview First Impressions and Creative Writingv2

Rewritten after more significant testing.

To start off Gemini 2.5 was incredible when it came to creative writing. Followed my instructions to the T, was smart when it came to knowing when to let a story marinate and when to push a plot. It handled multiple characters and complex scenes.

I couldn’t have asked for a better LLM and hope you don’t sunset it ever(please I beg, it is a gift to man kind.)I also used it a lot of coding and simple tips for using blender and it was top notch there.

Outside of a few “ism’s”, I had zero issues with 2.5.


First impression on 3.0 preview:

Doesn’t like to adhere to my personal thinking prompt the way 2.5 did. Also formatting issues, doesn’t follow prompted formatting.

But I’m begging you to let people turn built in reasoning off. Let us inject our own thinking formats without needing to use prompting trickery. Please.

Built in reasoning is not good for creative writing in my experience, more so with 3.0 than 2.5.

2.5 wrote incredibly when it followed my reasoning format, while skipping the built in stuff. I can’t stress this enough, my jaw was on the floor so often with 2.5 and how it rounded out a plot line while making call backs to the early story.

Built in reasoning is a huge hit to the writing, instruction following, formatting, and storytelling complexity. Also the ability to use assistant message as a final message in a prompt chain helped immensely for 2.5, please bring this back.

3.0 takes more coaxing to follow instruction.

Complexity in the storytelling is lacking. It feels very one dimensional, there are no layers, in both the characters and the plot. I feel like I’m trying to force a coding only model to tell a story.

Seeing some significant phrase repetition early into a chat. It uses horizon endings more.
Also not great at sticking with a characters traits in the story, softens out all the rough edges.

3.0 is less intelligent when it comes to progression, it’s like DeepSeek with constant need to push action and interruption. To compare 2.5 was nothing like that. It pushed at the right times, it let scenes chill without constant knock on door interruption.

It also self critiques in its writing which is not good, “She was wearing a grey jacket– no wait when did she put on a jacket? She was wearing a blue tank top.”


Now for the things I do like. The prose as always is great. It’s descriptive and really builds a scene.

Seems to have less of 2.5’s ‘isms’. Uses more action beats vs the 2.5 ‘He said, his voice xyz.’ as prompted.

Much like 2.5 manages multiple characters well.

The creativity so far has been nice but needs a slightly higher temperature Swipes also have variety still. Please don’t go the way of 1.0 max temp like the other large companies. The interplay between temp and top P to allow for more wildly creative responses is one of the best parts of using an LLM.

Overall, I will probably continue to use 2.5 for creative writing and wait for 3.0 to get some tuning.
2.5 wrote with more complexity, followed instructions better, allowed me to use my own format for thinking. It’s not a total miss. Just could use a little bit of tweaking, but in a blind test I pick 2.5 every time.

I hope you’ll consider what made 2.5 great in the creative writing space and apply that to 3.0. It’s a small niche I know, but it makes for more human like interaction.

I’ll update my post as I dive deeper into the model.

Other notes: As an assistant in AI studio, it’s performance is worse than 2.5. Assumes I don’t know what I’m talking about and that I’m wrong so it makes some needless changes.
Personality feels like 1.5. No ‘soul’.

Honestly, please consider going back to the way you did ‘exp’ versioning and testing. You iterated from a nightmare (0205) to a phenomenal model release (2.5) doing it that way.

26 Likes

Yeah, 2.5 pro is great for creative writing as long you manage to force it not to think by explicitly instructing it not to use any ‘thought’ tags (I assume this is what you meant by “using your own format for thinking,”)

In 3.0 preview the ‘thought’ tags seem to be hardcoded in (it’s not “dynamic thinking” anymore). It’s now similar to gpt 5, which also at creative writing.

Were you able to turn thinking off for 3.0 pro in any of your tests?

7 Likes

I assume this is what you meant by “using your own format for thinking,”

I would both have its internal reasoning off and give it my own CoD to use in its response (the way people did with CoT’s before reasoning). That was my preferred method that yielded the best writing for me. But my thinking format was highly tailored to my preferences, and just a simple chain of draft.

Were you able to turn thinking off for 3.0 pro in any of your tests?

I’ve only semi managed to make it stop. The responses where it used my thinking were better. But still felt flat. I’ve tried multiple variations, thinking prompts, no thinking prompts, different temp combos.
2.5 is still king, but 3.0 was more usable when it didn’t think. It still feels like a coding model trying to write as it feels very surface level story writing and changes characters in ways that don’t make sense. It play certain traits really well but then has a total change and softens any type of grit the character had.

8 Likes

i can definitely agree with you. a lot of what made 2.5 great, 3.0 is lacking in. its prose is still good, but everything else just feels lackluster compared to 2.5. hopefully they can continue to improve it and bring it back up to, or beyond, what 2.5 is currently like.

1 Like

The 2.5 pro model sure was very great when you could handle to bypass the reasoning process who was in my opinion biased to follow only the ‘user’ prompt with a priority above all (beside google policies) with under it your own system instruction, which mean you had to explicitly tell it to follow it’s system instruction everytime you would start another chat, either for characters creation or anything else, and I can’t sweat it enough… I didn’t tested the 3.0 model much so far, but I can tell it has potential on not following inferences too much, it is more compliant on what we want for me. So either y’all have been writting wrong in the system instruction, or their is a failure somewhere, because so far, this model is good for me to use. We’ll see in the future if I manage to spot some flaws in it. Again, that an opinion from someone only writting and creating characters and world for fiction on Janitor Ai, so it only count for a minority, also, gemini no matter what will be way more censored for writting direct scenario with characters that creating them, keep that in mind, this is why you feel like it is less better at writting than the 2.5 pro model, and again, the system instruction matter depending on how you construct it, especially after customising it for a year and three month like me to bypass everything. Anyway, gemini has it’s flaws and good, like any other llm. If you want an ai specialized on writting the best story, Claude is your ally there, the new recent gpt model is kinda good and doesn’t have the em-dash problem anymore, and in my opinion, Venice can do it well as well if you pay for it too. Overall, everyone as it’s ideology about it, so… I won’t judge.

I’ve been writing complex prompts, working with AI, and using Gemini models for a very long time now. Claude and OAI models aren’t good enough for my use case, which differs from Janitor Ai use. The front end I use offers way more control over prompting than that site as well. And I’m not speaking about any censorship at all in my post, I haven’t run into that.

It’s not user error. Its just knowing the models capabilities previously and the new update not being able to hold a candle to what 2.5 could do even with changes to my prompt engi. I enjoy prompt engineering to push LLMs, specifically Gemini, to their limits to see what they are capable of.

3.0 is a regression in several markers. Maintaining character consistency across a single scene, a task its predecessor handled flawlessly, logic, self critique bleeding into the response, story complexity, and harder repetition. And as I said it writes well, the prose is fantastic as always, and the dialogue beats are great and fresh. It even handles certain types of character traits in a new and fresh manner. Those are all great things. However, the story and other points, are where it falls flat, areas 2.5 never missed on.

3 Likes

Maybe, like all llm starting their first day out of their training.

One question tho… were Gemini 2.5 pro able to do what you describe when it ‘launched’? just to understand something myself.

Yes I used 2.5 without issue and previous versions from day 1, I never had any issue with Gemini on any launch or release day but I’m also on the paid tier.

Strange… maybe there something that escaping my grasp right now… I’ll test with different custom prompt to see if this 3.0 model can handle a scenario and it’s character representation for more than 50 messages first, just to see if I can try to patch this with a new custom prompt after some weeks maybe. Thank for this precious information.

Hey y’all,

So, after some testing using it for roleplaying/creative writing and I have some feedback on gemini 3 preview. Here is what I like so far that I think it does better than 2.5 pro:

  • I really like the creativity. It is waaaay more creative than 2.5.
  • It seems to do a better job of not sticking rigidly to a character’s persona. Basically allowing character growth which 2.5 pro sucked at.
  • Much better at spatial awareness. Uses characters’ heights and has them look up and down based on who they are talking too’s height.
  • It is smarter. Seems to pay attention to smaller details and will correct itself mid response (see the link though, as that is also an issue).

I largely agree with most of what has been mentioned here by Sammi. To add on to what I think could be improved:

  • Contrasting Sentences: Very often is like “it’s not an accident. It was on purpose.” Or “They do it not with grace, but with clumsiness.”
  • Tech Metaphors: Often stuff using computer metaphors when describing characters’ thoughts/emotions. “Her brain short-circuits.”, “Her brain does not compute.”, “His system malfunctions.”, etc.
  • Purple prose: Analytical and calculating gazes are frequent. Stuff like “She looks at him not with awe, but assessing.” or “He looks at him with a calculating gaze.” Instead it should be using stuff like awe, love, disgust, hatred, etc. Also included in “purple prose” are cliches, like the cliche of predator and prey, circling around someone while “sizing them up”, “the air crackles with tension”, “the word landed heavy”, etc.
  • Emotionless/Pragmatic NPCs: NPCs the AI creates tend to be pragmatic, calculating, or analytical and often lack defining or even interesting personality traits.
  • Echoing: Repeating part of User’s input back. If User says “I am home. I forgot the milk.” The AI would reply “You are home and you forgot the milk?”
  • Metagaming: Notice this quite a bit. Characters know info they shouldn’t/wouldn’t. If User’s input is (asterisks are narration, plaintext is dialogue): I really want to be a lawyer. Hey Mr. Zapp. How was traffic?
    The AI’s response as Mr. Zapp shouldn’t be (assuming no prior way of knowing User wants to be a lawyer): How is law school going? That’s what you want to study, right?
  • Cliches: This is probably a big one, but it probably is also the cause of some of the other issues as well (purple prose, tech metaphors, contrasting sentences, etc). I’ve noticed it in both narration (see the other issues), character behavior, and dialogue. Some examples of dialogue: “Welcome to the (name of city).” (and it isn’t like they are welcoming; it is after a tour or moving to the city that one of the characters ends the response with that). Character behaviors are usually cliched in the stereotype of the personality trait. If the User’s character needs someone, a new NPC is always the “competent (profession)”. Intelligent/calculating personality traits cause the tech metaphors (in dialogue and narration). One character was literally yelling at another “Unauthorized Access!” instead of saying “You can’t go in there!”. It’s just very cliche, tropey, and borderline cringe.
  • Linear Plots: The story is not as complex. It seems very one dimensional and doesn’t allow for complex plots or for tension to build over the course of a RP. I like getting to the action, but I also like a story. I want characters to have secrets, and big plot reveals, rather than just be more like your goal is x, so let’s get to x as quickly as possible without considering y, z, a, etc.
  • Proactive story telling: basically instead of progressing it just ends with a choice, ultimatum, or an “or else”. Like “do you want to go left or right?”, “You have five seconds to choose, or I do it for you.”. From what it (gemini 3) told me, is that it is protecting user agency and allowing the user to respond/react. But it often just makes it seem like it is left on a cliffhanger, thinking it is turn based, or asking how {{user}} wants to proceed (in character).
  • Repetition: often repeats similar descriptions/narration response to response. If it mentions “x” sign, it will mention “x” sign in the next response. It also tends to repeat similar paragraph structure and paragraphs. I have it generate a html artifact if there is a phone or something like that and if it starts the RP with an artifact, every one after will have a html artifact, even when it isn’t needed.
  • Name variety: I’d like to see better RNG/creativity when it comes to creating NPCs and locations. I see a lot of Kaelen, Kael, Elara, The Daily Grind (a coffee shop), and other common names when a NPC/location needs to be created, when I’m not using my RNG prompt.
  • Positivity bias (I’m sure there is a better term, but this is the best I can think of now): Basically softening “rougher/grittier” characters when they shouldn’t be. A super villain (say Venom, Thanos, or Bane) isn’t going to try to romance the User’s character like 99% of the time. And if they do, then it should be a result of the RP playing out; not at the start.
  • Not following instructions: Feels like sometimes it ignores instructions.
  • Too literal with instructions: Had an instruction about improvising, and it took that literally. Characters were saying stuff that didn’t make sense and were out of place and didn’t fit the context.
  • Temp: temp seems to be weird as well. Despite the docs being recommended at temp 1, I have noticed it is better at anything but temp 1. Even 0.8 was better.
  • Em Dashes and Characters interrupting others: I see a lot of em dashes, despite my best efforts not to, and it usually comes with characters speaking and interrupting like this:
    Char 1: “Hey, did you-”
    Char 2: “Quiet. Did you hear that?”
    Char 1: “Hear wh-”
    Char 2: “I said be quiet!”

And I would very much like to be able to turn off the built in reasoning to use our own thinking templates (or be able to replace the built in reasoning with our own). Even if most of the issues above can’t be fixed for gemini 3 pro, they can largely be fixed in the thinking if we could use our own. For reference, I have tried other models for RP. Sonnet 4.5 and gemini 2.5 are two of my favorites (though 2.5 had its…quirks). GLM 4.5 is good, 4.6 is better when it thinks. I’m really hoping that gemini 3 pro can be the best for it!

Hope that is helpful feedback and would love to see all of it incorporated (or failing that, most of it :smiley: ).

3 Likes

2.5 was capable of character growth in my experience. You could very easily avoid any rigidity in traits through prompting and character writing. I was easily able to prompt myself out of any difficulties with certain characters by simply altering character sheets to use difficult traits better, and having a few lines about character growth in my prompting. Context matters when working with certain personality traits.

On the flip side 3.0 softens characters to constant comfort endings with in one message and that is not able to be beaten through prompting methods.

3.0 pushes everything to a “whats best to make user happy as fast as possible” ending constantly and this positivity bias makes it lack in storytelling. It does’ this with extreme subtlety more often than not so people probably don’t even notice it. But its significant enough for me to see the pattern.
There is no intrigue or mystery. Plots are all one note and lack complexity & conflict due to speedrunning “comfort/soft” resolution at the expense of character portrayal. It’s not just the gritty characters either, this happens even on the most wholesome characters too. This has shown to be true across a number of characters with varying traits.

3.0 would rather rush through plot beats with excessive time skips, front-loading resolution, knock on door interruptions, and treating prompt instructions as items to tick off a list.

I also disagree on creativity of 3.0 outside of the coding creativity. When it comes to a plot it can’t hold its own at all. 2.5 can write circles around 3.0 - though I will admit this is likely in part due to its “Assistant” and coding focus so there is a lack of good storytelling while the prose can be nice.

3 Likes

That’s interesting, since I experienced the opposite. I couldn’t prompt character growth or rigidity out of 2.5, but I can prompt out the softening characters in 3 (and the overall positivity bias). I did see the softening of characters and positivity bias in 3 though. It was pretty evident without my prompting.

And yeah, the story is pretty linear and lacks complexity. Trying to see if I can prompt that out of it too and add more tension to it.

Do you mean 2.5 was more creative in writing? If so, then I’d agree 2.5 is better, but I was talking about creativity in general. Like it comes up with better details, worldbuilding elements, etc.

Oh yes!

I’m admittedly not doing much in the way of creative writing, well… kinda, I’m making a special creative writing dataset, so technically there’s a lot of creative writing going on, for a few paragraphs at least, I’m just not reading much of it as I’m just testing my custom software now.

That being said, the name repetition is absurd, I’m also constantly seeing Kael*, Elara, Jax, and others far too often.

Now, I’m not worried about the names since I can mass replace those, but I’m going to have to pay attention to the other things you raised here because I don’t want it polluting my dataset that I train other models with!

Then again, seeing how I’m doing it with a large pool of randomized scenarios and genres, maybe not too much of an issue, but I do have to worry about it.

Hi @Sammi_W,

Thank you for your feedback. We appreciate you taking the time to share your thoughts with us, and we’ll be filing a feature request.

2 Likes