Why Opus 4.6 quotas are dead in 10-20 minutes (and how we force Google to fix it)

Fellow developers,

We need to talk about the elephant in the room. For months, we’ve endured the silent downgrades, the shadow banning, and the agonizingly slow responses from the devs. But today’s stealth decrease to the Ultra plan quotas is the absolute breaking point.

Like many of you, I upgraded to the Ultra tier for one reason: to use the most capable models on the market, specifically Claude Opus 4.6 (Thinking). When this tier launched, I could run my standard heavy workflow for about 2 solid hours before hitting my limit.

Today? My entire Opus quota evaporated in 20 minutes flat.

Now I’m forced to stare at an empty yellow bar mocking me with a “Refreshes in 4 hours, 40 minutes” timer, effectively bricking my workflow for the rest of the afternoon.

I’ve spent the last few days digging deep into the network requests, API overhead, and the underlying economics of what’s happening behind the scenes. Here is the ugly truth Google won’t tell us:

The new “Thinking” architectures for models like Opus 4.6 and Sonnet 4.6 consume an astronomical amount of invisible reasoning tokens to process chain-of-thought before outputting a single line of code to our editors. Google vastly underestimated the compute overhead required for power users on the Antigravity IDE. But instead of upgrading their infrastructure or absorbing the cost like a trillion-dollar company should, they are quietly passing the bottleneck down to us.

They are suffocating our daily limits to starvation levels for one incredibly greedy reason: to push us into clicking that new “Enable AI Credit Overages” toggle in the Models settings.

They are intentionally slashing our base quotas so we hit a wall in 20 minutes, effectively forcing us into pay-as-you-go microtransactions just to finish a standard workday. They are double-charging us for compute we already paid a premium for.

You cannot market a revolutionary IDE and then lobotomize it after 20 minutes. This isn’t a professional tool anymore; it’s a freemium mobile game trap.

Complaining in isolated feedback threads hasn’t worked. The PMs are ignoring us. So here is the revolution:

Google only speaks one language: Metrics and Churn.

  1. Refuse the Overages. Do NOT toggle “Enable AI Credit Overages.” Do not give them another cent to reward a broken promise.

  2. Cancel Auto-Renew today. Go to your account settings right now, cancel your Ultra plan, and explicitly write “Unusable Opus 4.6 Quotas” as your reason. Hitting their retention metrics is the only alarm bell that will ring in a Google boardroom. You can always resubscribe when they fix it.

  3. Make them answer. Upvote this thread. Share it. Keep it at the top of the board until a Lead PM actually steps in here with a transparent roadmap and restores our quotas to what was advertised.

We are not beta testers, and we are not an ATM. We are the developers keeping this platform alive.

Google, if you don’t restore the Ultra quotas and give us the transparency we deserve, we are walking.

Your move. Who is with me? Drop your empty quota screenshots below. Let’s make some noise.

While I agree with you fully that stop giving even 1 cent to google is the best way to make them at least acknowledge our issues, I’m sorry to say I just bursted out laughing as soon as I read “or absorbing the cost like a trillion-dollar company should”. This was genuinely one of the funniest things I’ve read in a while :rofl: :rofl:
How do you think google turned out becoming a trillon dollar company? I’ll give you a hint, it wasn’t by “absorbing the cost”.
I think the only way this will be fixed is if google releases a new version of gemini that is as capable or more capable than opus 4.7
Opus requires way too much compute to use it in a fully autonomous system like antigravity.
And anyways if you absolutely want to use opus, why not get claude code instead? it’s much cheaper than antigravity for sure.

A thing worth noting: even anthropic spends huge amounts of money when testing their models in fully autonomous tasks. It’s just the opus models that require a lot of compute and compute is limited, not much going around it. In the next years models will get slowly a bit cheaper or they will stay at same price but they’ll get better and faster at completing a task which will result in lower cost overall. Just gotta wait it out

Glad I could give you a good laugh! :clinking_beer_mugs: And honestly, you’re 100% right on one thing: Google didn’t get to a multi-trillion-dollar valuation by running a charity. They are a ruthless profit machine.

​Google absorbed billions in server costs for years to make YouTube the undisputed king of video. They bled money to put Android in everyone’s pockets. When you are trying to completely monopolize the AI developer space and crush competitors like Anthropic’s own Claude Code, you absorb the compute cost to lock down the power users and build an unbreakable ecosystem. Squeezing your highest-paying “Ultra” users dry over API margins right now is a fundamental failure of their own long-term strategy.

​But let’s set the economics of compute aside, because your point about Opus 4.6 being insanely heavy for fully autonomous loops is completely scientifically accurate. And that is exactly why Google’s current strategy is essentially a scam.

​To address your points directly:

​1. “Opus requires way too much compute… compute is limited.”

If Google’s PMs and bean-counters knew that the token burn rate for autonomous Opus was mathematically unsustainable, they never should have anchored the entire Ultra subscription around it.

If a premium steakhouse realizes Wagyu beef has gotten too expensive, they don’t charge you $150, serve you a single meatball, and tell you “cows are expensive, just wait a few years.” If they cannot afford the compute to let Opus run in an autonomous IDE, selling it as a flagship feature of a flat-rate Ultra tier is straight-up false advertising.

​2. “Why not get Claude Code instead?”

Because we didn’t pay for a raw CLI tool. We pay the Antigravity premium for the deeply integrated ecosystem—the multi-agent orchestration, the visual sandboxes, the seamless GUI workspace management. Tearing down our workflows to switch back to a terminal takes days. Telling users to “just go use Anthropic’s platform” is exactly what Google wants us to do when our quota runs out, all while they quietly keep our monthly Ultra subscription fee anyway. If the best solution to Antigravity being broken is to leave the IDE, then I’m taking my subscription money with me.

​3. “I think the only way this will be fixed is if google releases a new version of gemini…”

You just accidentally exposed their exact corporate playbook. By purposefully starving the Anthropic quotas to 20 minutes, they are making the experience so painfully restrictive that we throw up our hands and say, “Fine, I’ll just use Gemini 3.1 Pro.” Running Gemini on their own TPUs costs them pennies; paying Anthropic’s API fees eats their margins. They are artificially crippling a superior model on their platform to force adoption of their own ecosystem.

​4. “Just gotta wait it out”

We are developers shipping production code today. I am not paying a premium subscription fee to fund Google’s R&D while I wait for a hypothetical Gemini update to save the day, or for compute to magically get cheaper. If they silently slashed the product’s utility by 80% over the last few months, the subscription price needs to drop proportionally.

​We shouldn’t normalize this. If we accept the “compute is hard” excuse now, we validate their bait-and-switch. They will do the exact same thing when Opus 4.7 drops, and we’ll be right back here.

​Don’t let them off the hook and don’t excuse the pricing. Keep the Ultra cancellations rolling, everyone. :chart_decreasing::raised_fist:

I have noticed this as well. When Opus 4.6 first launched, it did seem that we were able to get a a lot more prompts in a single session. However, are we certain that Google has slashed the number of tokens we’re permitted? Perhaps I’m behind the curve on this if something was announced and I’m just unaware. Anthropic did change some things on their end prior to the launch of Opus 4.7. Is it possible that Anthropic updated parts of Opus 4.6’s harness to be as token greedy as 4.7? I’m not trying to make excuses for Google. I’m just trying to get closer to the truth of the matter. Because if that’s the case, I think a different prompting strategy might be warranted to get better results.

I’m just trying to look on the bright side here of how to take advantage of a situation like this, even if we don’t have complete power to be able to have an unlimited quota. A lot of the research I’ve been doing lately has been around the idea of token efficiency and what that might mean. For example, the reason the genie in the lamp problem is so interesting is because you only have three wishes. I feel like a lot of times when I’m prompting, I prompt like when Homer Simpson met the guru on the top of the mountain. The guru says, “You may ask me three questions.” Apu replies, “That’s great, because I only needed one.” Homer interjects and says, “Are you really the head of the Kwik-E-Mart?” When the guru replies “yes”, Homer then asked the question “really”, twice. So I think maybe times like this, constraint can maybe drive innovation in terms of how we interact with the models to be more token efficient. Indeed, it would be much better to have an unlimited token quota to be able to test this over multiple iterations quickly, but for now I guess we’ll just have to make do.

Actually I think you’re wrong. Google doesn’t care about having the best model. If they will have the best model at some point, it will be just a coincidence, a result of their superior infrastructure.
You can easily see this by the fact that google just gave anthropic 10 billions (check the news) to help them with funding the continued development of claude models, with an option of additional 30 billions if claude models meet certain performances.
Google doesn’t want to win the “best model” race, that’s not the most profitable thing, what google wants is to win the infrastructure race, having everyone stay on their platforms, while giving freedom to use whatever models they want. With models they just want to ensure that they have a somewhat competitive model, if it’s the best or not it doesn’t matter to them at all.

What’s happening with antigravity is a shame, but what people should complain about is the endless “agent terminated due to error” bug. I feel this is much more severe than the quota reduction.

Anyways Claude Code is not just CLI tool. There’s the extension in vs code. It’s almost like using antigravity. It costs 90 dollars vs antigravity (which is currently discounted for first 3 months at 130 something, but after 3 months its like 230 ish if i remember correctly), the price difference alone makes it very obvious which one is more convenient. You can get 2 accounts with claude and that’s 90 dollars x2 which is still less than the full price of antigravity ultra subscription and with 2 accounts you’d basically never encounter usage limits unless you’re running multi agents sessions day and night non stop.

Most developers are already using claude code and cursor

And to answer to @Phil3 , it only helps to refine prompt for those that were prompting extremely bad. There’s only so much that you can write in a prompt, because the models struggle to do too much in 1 go and in antigravity in particular you don’t really have control over each single prompt, you only control your original prompt, the rest is all automatic, including ofc tokens used.

And I personally get errors like this every 2 seconds when trying to use opus. I can’t even reach the end of the max tokens that I’m allowed to use because it just blocks continuously:

Trajectory ID: 0a5ffa68-7e73-49df-8178-3d251ae98
Error: HTTP 503 Service Unavailable
Sherlog:
TraceID: 0x35da6c4f11619d
Headers: {“Alt-Svc”:[“h3=“:443”; ma=2592000,h3-29=“:443”; ma=2592000”],“Content-Length”:[“429”],“Content-Type”:[“text/event-stream”],“Date”:[“Sat, 25 Apr 2026 15:17:46 GMT”],“Server”:[“ESF”],“Server-Timing”:[“gfet4t7; dur=868”],“Vary”:[“Origin”,“X-Origin”,“Referer”],“X-Cloudaicompanion-Trace-Id”:[“35da6c4f11619d54”],“X-Content-Type-Options”:[“nosniff”],“X-Frame-Options”:[“SAMEORIGIN”],“X-Xss-Protection”:[“0”]}

{
“error”: {
“code”: 503,
“details”: [
{
@type”: “type.googleapis.com/google.rpc.ErrorInfo”,
“domain”: “cloudcode-pa.googleapis.com”,
“metadata”: {
“model”: “claude-opus-4-6-thinking”
},
“reason”: “MODEL_CAPACITY_EXHAUSTED”
}
],
“message”: “No capacity available for model claude-opus-4-6-thinking on the server”,
“status”: “UNAVAILABLE”
}
}

These quotas are ashaming i’m not crazy enough to pay 200 bucks, i’m just ‘‘pro’’ plan, so i get 20 minutes of opus evry 5, 10, 15, and now 25 day’s :rofl:

claude code is at 90 dollars and works flawlessly, worth switching