Gemini 3.1 Pro + Antigravity: benchmark go up, usability goes down — here’s why some of us are leaving

Gemini 3.1 Pro + Antigravity: benchmark go up, usability goes down — here’s why some of us are leaving

I’ve been using Gemini Pro daily for software development (C#/.NET, T-SQL, REST APIs) through Antigravity since late 2025. With the forced migration from gemini-3-pro-preview to gemini-3.1-pro-preview on March 9th, I need to share what’s actually happening on the ground.

The good (on paper): ARC-AGI-2 jumped from 31.1% to 77.1%. SWE-Bench improved to 80.6%. Hallucination rate dropped from 88% to 50% on AA-Omniscience. These are real improvements. I’m not disputing the benchmarks.

The bad (in practice):

  • Mandatory thinking with no off switch is the single worst UX decision in this release. TTFT of ~35s reported by Artificial Analysis means I’m staring at a cursor for half a minute before getting a response that 2.5 Pro would have delivered in 3 seconds. For iterative coding — where I need 40-50 exchanges per session — this adds 15+ minutes of pure waiting per day. You’ve optimized for reasoning depth at the direct expense of developer velocity.

  • Basic syntax errors that 2.5 Pro never made. Concrete example from today: the model generates GO IF EXISTS ( on a single line in T-SQL. The GO batch separator goes on its own line — this is day-one T-SQL knowledge. If your “reasoning-heavy” model can’t handle basic syntax it previously got right, something broke in the training pipeline.

  • Safety filters blocking innocent content with probabilistic inconsistency. Same prompt, same image, same session: blocked once, passes on retry. A morphing transition between two fully-clothed women (one at a cocktail bar, one by a fireplace) flagged as policy violation. This isn’t content moderation — it’s a dice roll. And per community reports on this forum, 3.1 Pro now charges full token cost including thinking tokens on safety-rejected requests. So we pay for the privilege of being randomly censored.

  • Antigravity token throttling is killing productivity. Rate limits tied to “work done” means complex refactoring tasks — exactly what an agentic coding tool should excel at — burn through quota fastest. My primary account was locked until March 11th after normal development work. The upsell to the €275/month plan feels less like a feature and more like a ransom note.

  • The March 9th silent migration from 3 Pro to 3.1 Pro broke workflows without warning. No changelog in the response headers, no deprecation grace period — just suddenly different behavior, different latency, different failure modes.

The bottom line: I now use Gemini for about 20% of my coding work. The rest has moved to Claude (which handles the same C#/.NET tasks faster, with fewer errors, and without random safety blocks) and to other tools. A lot of my VEO credits sit unused because the safety filters make video generation of any content involving human subjects practically impossible.

You’re building a race car engine and putting it in a vehicle with the parking brake permanently engaged. The capability is there — I can see it in the benchmarks. But capability I can’t access reliably, quickly, and predictably is capability that doesn’t exist for me.

I’m not asking for less safety. I’m asking for safety that works consistently, thinking that I can control, and a migration process that doesn’t break production workflows overnight.

2 Likes

Hello,

Thank you for bringing these concerns to our attention. Please be assured that I have shared your feedback with our internal team for further review. We sincerely apologize for the inconvenience this has caused. We have escalated the issue to our internal teams for a thorough investigation.