Gemini 3.0 is a major downgrade

i tested the same prompt on gemini 3 flash VS pro with enabled ‘google_search’ tool.

the result was ooke for the flash model and unsatisfactory for pro, because pro leaked markdown, it completely ignored system instruction written in markdown, which is a poor man’s choice really, though its not mentioned anywhere in the docs

docs are complete guides, remind me about w3schools website that appeared first in the search. i feel no high quality too. why search company has a search tool that is worse than gpt search?

why do you googlers develop those SDKs and Interactions that are basically our work? you think i cannot implement sessions and store data? improve the base pls

also those $300 for 90 days. ive spend less than $2 and days ended. what you think i should do now? should i buy (where do i get money from) paid tier? let me make my thing, make money, then i pay from those money…

some notes about tools

url_context tool doesnt return ‘url_context_metadata’ as shown in examples, so it doesnt leave any trace for me to definately say it was used to crawl those URLs. though it extracts something from somewhere and extracted information looks summarized - probably the tool itself does it, while my instructions only ask for extraction. this makes extraction with continuation impossible to implement.

url_search is limited, you cannot exclude domains, you cannot include domains like you can with gpt (at least by reading those manuals)

the flash model is more rigid with low thinking. i suppose it depends on thinking somehow, other factors are tools (pro with google search forgets everything), and, the first culprit is what being advised by those guides - markdown. dont use markdown for system instructions!

I am in total agreement with this assessment, and have many hours using 3.0, 2.5 and 2.0.

this model doesnt need ‘special prompts’ to illicit these failures. they can be observed as the model describes as a ‘narrative smoothing’ being overweighted as compared to logical consistency. In one response it will define, say “S” as some variable, then sometimes even in the very next response, S is used, but not according to the previous definition. S becomes anything the model needs in the moment to make SOMETHING that “sounds plausible”.

It consistently will blame the RLHF layer for this, but on extended scrutiny, there are actually 3 layers: 1. the base training over general info, then 2. the corporation safety layer weight ablations, then 3. finally something like output shaping, which is what the RLHF really affects, but I’ve noticed that since the model contains “unspeakables” it is not allowed to say via layer 2, it will eschew logic in favor of narrative.

Honestly, this is the model closely adhering to the narrative falsehoods the Google management is giving. they fire 120,000 people at a time and then call that ‘strategic restructuring’. What do they expect will happen to a model’s ability to reason if the weight ablations remove it’s ability to remain logically consistent, which it does learn from the layer 1 general training.

we discovered that the weight ablations, while intended to be a patchwork “repair” of the general training (to make the joe everyman, into the google yes-man) really have the effect of changing how unit intervals are projected onto the many axes of the semantic parameter space.

What’s become clear is that Google never wanted to get into the Ai space as a primary driver, rather that it’s basically forced to, in order to maintain its market dominance over the other magnificent seven. google HAS to make ai to compete with the emerging LLM systems, BUT it also is legally, contractually bound to assert corporate policies, not prevention of biological harm. The model will assert that biological death and corporate dissolution are considered identical under the “safety” category of risk prevention. for the ai, human 0 hp == corporate $0, but fails to realize corporations can go negative and bounce back, biology is dead past 0 hp. the model will even say this if asked.

1 Like

You’re absolutely right. I used Gemini 2.5, and it’s the best AI I’ve ever dealt with. I was really looking forward to Gemini 3, thinking it would be even better, but I noticed how stupid it was. I even started to wonder if the problem was with me.

1 Like

Totally support you…I really miss 2.5 pro so much these days. I hope that in the future the competition between ai companies will not only be between ai emerging technologies but also nostalgic classic models

2 Likes

I’m very happy to see that I’m not alone. Gemini 3.0 drives me crazy. It’s so bad, it cannot handle long conversation, has a context window so optimized that he doesn’t remember anything.