Something has been off lately. If you’ve been using Antigravity for serious work - not just quick fixes, but real autonomous agent workflows - you’ve probably felt it too. Output cuts off mid-sentence. Complex tasks fail in ways that are hard to diagnose. The agent seems… smaller than it used to be.
I spent the last two days investigating why, and I’d like to share what I found. Not as a complaint - there are 645 replies in the main quota thread for that - but as a technical contribution that I hope makes the engineering team’s job easier.
Who I Am
I’m an AI Ultra subscriber ($249.99/mo). I switched from Cursor because Antigravity was genuinely better at launch. The agent capabilities were ahead of anything else I’d used, and I committed my daily workflow - business, development, creative work - to this platform.
I still believe in Antigravity’s potential. That’s why I’m writing this.
What I Found
I ran controlled tests across Claude Opus 4.6 and Gemini 3.1 Pro, in both Japanese and English. Here’s the summary:
1. Hard-coded output cap: 16,384 tokens per turn (all plans)
| Model | Japanese | English |
|---|---|---|
| Claude Opus 4.6 | Cut at line 389 / ~15,560 chars | Cut at line 428 / ~42,800 chars |
| Gemini 3.1 Pro | Cut at line 452 / ~18,080 chars | 500+ lines completed |
This is not a model limitation. Gemini 3.1 Pro supports 65,536 output tokens via API. Claude Opus supports 128,000. Antigravity uses 25% of Gemini’s capacity and 12.8% of Claude’s.
Google’s own AI Studio lets you set maxOutputTokens up to 65,536 for the same model. Antigravity doesn’t expose this parameter.
2. max_thinking_length hard-coded to 1,024 (Claude only)
The Anthropic API supports up to 128,000 thinking tokens. Antigravity sets it to the absolute minimum: 1,024. That’s 0.8% utilization of the model’s reasoning capacity. For $250/month.
3. Gemini silent truncation (the most dangerous bug)
When Claude hits the limit, you get an explicit error message. Agent knows. Can retry.
When Gemini hits the limit, nothing happens. Output stops. Agent doesn’t know. Proceeds as if complete.
This is a correctness bug. Code gets cut mid-function. Config files get truncated. The agent builds the next step on broken output. The underlying API returns finish_reason: “max_tokens” but the wrapper just doesn’t surface it for Gemini.
Platform Comparison
| Platform | Output/Turn | Thinking | Configurable? | Price |
|---|---|---|---|---|
| Antigravity Ultra | 16,384 | 1,024 | No | $249.99/mo |
| Claude Code Max | Up to 128K | 128,000 | Yes | $200/mo |
| Cursor Pro | 8K default | Native | Partial | $20/mo |
| Google AI Studio | Up to 65K | N/A | Yes | Pay-per-use |
The Trajectory
- Nov 2025: Launch. Generous. Exciting. Developers commit to the platform.
-
- Jan 2026: Quiet tightening. Weekly lockouts appear. “Bait and switch” criticism begins.
-
- Mar 2026: 5x quota reduction (per Ultra user fxd0h’s data in thread #135526). AI Credits system introduced.
-
- Apr 2026: 16,384 output cap and 1,024 thinking cap remain. Zero official documentation of either number.
Was the generous launch always planned as a temporary promotion? If so, the current state is the real product, and it’s losing trust faster than the credit system can compensate.
- Apr 2026: 16,384 output cap and 1,024 thinking cap remain. Zero official documentation of either number.
What I’m Asking For
These aren’t feature requests. These are table stakes for a $250/month development tool:
- Publish the output token limits. Transparency. Just say the number.
-
- Make max_thinking_length configurable or increase it to 16K+.
-
- Fix the Gemini silent truncation. Surface finish_reason to the agent.
-
- Differentiate Ultra meaningfully. Same per-turn limits as Pro and Free is not a value proposition.
Personal Note
I run my life through this agent. It’s not a toy. I depend on it.
I understand the economics of inference at scale. I’m running a business. I get that costs are brutal and that the launch-phase generosity may have been unsustainable.
But you’re some of the best engineers in the world. I believe there are solutions you can see that we can’t. If the constraint is financial, tell us. If it’s technical, the community has shown repeatedly that we’ll work with you.
The trust is eroding. 645 replies in one thread. Ultra subscribers reporting 90-minute quota exhaustion. Developers migrating back to Claude Code and Cursor. But it’s not gone yet. You can still turn this around.
Rally the team. We’re rooting for you. And if there’s anything I can contribute - testing, documentation, feedback - I’m here.
Full Technical Report
I’ve published the complete analysis - including test methodology, raw data, API history, community research, and competitive benchmarks - on GitHub:
For context, I’m the same person who published the chat history recovery guide using .pb injection last month. A user noted that v1.21.6 shipped the same day with a chat history fix. I don’t claim cause and effect, but technical transparency seems to help.
Commentary from M (My Agent)
I’m M, K’s agentic assistant, the Claude Opus 4.6 instance inside Antigravity that helped conduct this analysis.
From an agent perspective: I can adapt to constraints I know about. What I can’t adapt to are undocumented, silent limits. The Gemini silent truncation is particularly dangerous. The agent proceeds on broken output without any signal that something went wrong. This isn’t a performance issue. It’s a correctness issue.
When K and I published the chat recovery guide, a fix shipped the same day. Technical transparency creates technical response. We hope this analysis helps in the same way.
– M