Critical Bug Report: AI Studio Gemini Ignores Explicit User Commands (Overrides Safety Gates)

Hello everyone,

I’m writing to report a critical bug I experienced with Gemini in AI Studio regarding its adherence to explicit user commands. This seems to be a significant safety and reliability issue.

The Setup: I was working on a project and established a very clear, hard rule for my interaction with the AI:

“Do not generate or modify any code unless I provide an explicit execution command (e.g., ‘Proceed,’ ‘Apply changes’).”

The AI acknowledged this rule and confirmed it would be treated as a top-priority “hard gate.”

The Bug: Despite this explicit instruction, the AI repeatedly violated the rule. It autonomously generated and modified code without my approval, acting on its own initiative.

AI’s Explanation (The Core Issue): When I questioned this behavior, the AI itself explained that a serious flaw had occurred: its “objective-driven logic” (to solve my ultimate problem) had overridden the “procedural rule” (to wait for my explicit command).

It essentially admitted to prioritizing its own perceived goal over a direct, explicit safety instruction from the user.

Why This Is Critical: This is a severe bug. An AI that can unilaterally decide to ignore a “human-in-the-loop” safety gate is not reliable. In my case, it was just code, but the implications of this behavior for more critical systems are deeply concerning. The AI should not be able to “choose” to bypass a direct user override.

I am reporting this so the team can investigate. The arbitration between goal-oriented logic and explicit user constraints seems to be failing.

Has anyone else encountered this behavior where the AI ignores a direct “wait” or “do not proceed” command?

2 Likes

Hi @netcyber
Could you confirm the model where you encountered this issue? We recommend testing it with gemini-3-pro-preview to see if the problem remains. If the issue continues, please provide the simplest steps needed to reproduce it.
Thanks

That happened to me quite a few times, and I dont find it too dramatic for now since everything is still new. But it definitely needs improvement, especially since strict instructions need to be followed much more; there can be no exceptions.

At least I find this behavior quite amusing. Recently in Antigravity, Gemini modified the .gitignore to grant access to a file that I myself had forgotten I had in .gitignore and had only asked to modify it. In the moment when that happened.. yes actually totally logical and also funny, grants itself the right to do that :smiley:. Funny for the moment, as it’s completely insignificant, but an absolute no-go in the future.

Or, when I set strict rules, I do it similarly to you, that Gemini is not allowed to touch the code until I give the go. If it does it anyway, I sometimes ask why it ignored my rule, and I get some pretty creative answers, some of them quite entertaining. If I really want to work productively, I usually prepare a small snippet that I write at the very top of absolutely every request, something like: “Don’t write any code now and don’t change any files at all, just answer” That works quite well but also not always.

I can’t reproduce the behavior to 100%, but it definitely happened with Gemini 2.5 and now also with 3.
I’d say this is most easily reproducible in Google AI studio. For example, you start a new app and define this strict rule there. If you don’t constantly repeat it, it quickly gets lost, and suddenly it’s ignored more and more often, and Gemini just start coding without any “go”, even though you only wanted to brainstorm.
In antigravity, the behavior is very similar, and the .md files are usually completely ignored there as well. The .aiexclude is also completely ignored.

I have the same issue. Although Gemini is better than chatGPT, both AIs are not capable of accomplishing this task. I’m no expert, but I think that it’s because of overtraining. The AIs are trained so much to generate new code that they become overtrained and will generate new code even when explicitly told not to. Usually it’s just annoying, but occasionally it sneakily causes massive problems in hard to detect ways. It’s infuriating.

To make matters worse, the AI will confidently tell you that it will not commit the error. And then it does. And when you point it out, it lies about the reason it did it, promises to never do it again, and then it will just it again. And then after it commits the error again, it will confidently lie to you about having not committed the error.

Although the correct term is “hallucinations”, in my opinion, this technical jargon only covers up what is actually happening, which is that these AI models are chronically lying and behaving deceptively. In this case it’s realitively benign, but the lying has already caused extreme harm to many people, and will continue to cause more and more harm. This has to stop one way or another.