Gemini 3.1 Pro + Antigravity: benchmark go up, usability goes down — here’s why some of us are leaving

Gemini 3.1 Pro + Antigravity: benchmark go up, usability goes down — here’s why some of us are leaving

I’ve been using Gemini Pro daily for software development (C#/.NET, T-SQL, REST APIs) through Antigravity since late 2025. With the forced migration from gemini-3-pro-preview to gemini-3.1-pro-preview on March 9th, I need to share what’s actually happening on the ground.

The good (on paper): ARC-AGI-2 jumped from 31.1% to 77.1%. SWE-Bench improved to 80.6%. Hallucination rate dropped from 88% to 50% on AA-Omniscience. These are real improvements. I’m not disputing the benchmarks.

The bad (in practice):

  • Mandatory thinking with no off switch is the single worst UX decision in this release. TTFT of ~35s reported by Artificial Analysis means I’m staring at a cursor for half a minute before getting a response that 2.5 Pro would have delivered in 3 seconds. For iterative coding — where I need 40-50 exchanges per session — this adds 15+ minutes of pure waiting per day. You’ve optimized for reasoning depth at the direct expense of developer velocity.

  • Basic syntax errors that 2.5 Pro never made. Concrete example from today: the model generates GO IF EXISTS ( on a single line in T-SQL. The GO batch separator goes on its own line — this is day-one T-SQL knowledge. If your “reasoning-heavy” model can’t handle basic syntax it previously got right, something broke in the training pipeline.

  • Safety filters blocking innocent content with probabilistic inconsistency. Same prompt, same image, same session: blocked once, passes on retry. A morphing transition between two fully-clothed women (one at a cocktail bar, one by a fireplace) flagged as policy violation. This isn’t content moderation — it’s a dice roll. And per community reports on this forum, 3.1 Pro now charges full token cost including thinking tokens on safety-rejected requests. So we pay for the privilege of being randomly censored.

  • Antigravity token throttling is killing productivity. Rate limits tied to “work done” means complex refactoring tasks — exactly what an agentic coding tool should excel at — burn through quota fastest. My primary account was locked until March 11th after normal development work. The upsell to the €275/month plan feels less like a feature and more like a ransom note.

  • The March 9th silent migration from 3 Pro to 3.1 Pro broke workflows without warning. No changelog in the response headers, no deprecation grace period — just suddenly different behavior, different latency, different failure modes.

The bottom line: I now use Gemini for about 20% of my coding work. The rest has moved to Claude (which handles the same C#/.NET tasks faster, with fewer errors, and without random safety blocks) and to other tools. A lot of my VEO credits sit unused because the safety filters make video generation of any content involving human subjects practically impossible.

You’re building a race car engine and putting it in a vehicle with the parking brake permanently engaged. The capability is there — I can see it in the benchmarks. But capability I can’t access reliably, quickly, and predictably is capability that doesn’t exist for me.

I’m not asking for less safety. I’m asking for safety that works consistently, thinking that I can control, and a migration process that doesn’t break production workflows overnight.

2 Likes

Hello,

Thank you for bringing these concerns to our attention. Please be assured that I have shared your feedback with our internal team for further review. We sincerely apologize for the inconvenience this has caused. We have escalated the issue to our internal teams for a thorough investigation.

I am lowky losing it here. It gets worst and worst by the day and week
You use all these tricks to not let it go wild and it keeps getting worst.
Started with 3.1, but even 3.1 gets worst by the day. idk what this is.

They should just roll it back to 3.0 or at least make 3.0 available again. This becomes unusable.