I am writing to report a critical issue with the Google AI Ultra tier that makes the service currently unusable for development tasks. While I have noted the recent significant reductions in usage limits and am willing to work within those constraints, the model has become incapable of completing even basic generations.
Every time I request a code-related task, the model fails mid-generation with the error:
“The model’s generation exceeded the maximum output token limit.”
Context & Evidence: I have attached a screenshot, image_306ad4.png, which documents a recent attempt to generate a header file called NativeVmTrashGen.h. As shown in the log:
The model repeatedly hits the token limit.
Even when it explicitly states it is “Breaking into parts to stay within limits” or “Splitting into two files,” it immediately triggers the same error again.
The cycle repeats indefinitely without ever producing a usable snippet of code.
Impact: As an Ultra subscriber, I expect the “highest rate limits” and the ability to handle complex prompts. Currently, the model cannot finish a single file, regardless of how many times it tries to “self-correct” or split the output. This effectively renders the subscription pointless for any task more complex than a single-paragraph response.
Questions for the Team/Community:
Is this a known backend bug specifically affecting the Ultra tier’s output window?
Are there any recommended settings (e.g., specific model versions or temperature adjustments) to prevent these immediate crashes?
Is there an ETA on a fix for this “infinite loop” of token limit errors?
I appreciate any guidance or acknowledgement from the Google team.
they are just limiting models context to try to save tokens, but in doing so they are making the models worse, and the whole antigravity platform has turned into a bad joke full of bugs and issues. As long as people keep paying for this joke of a platform, they have no incentive to change anything.. I hope people can wake up
the irony is the model still generates the tokens in the first place. its not like the model stops at N tokens, it just doesn’t let it output after generation is done burning your quota
When will this issue be fixed?! Where can we submit an issue or call someone for an explanation?! I’ve come across this forum; there are multiple issues with the same context, ongoing for around 25 days.
Let’s say it’s OK to wait for 7 days for refresh; that is something that makes sense. But I got the The model’s generation exceeded the maximum output token limit error like 5 times. Then I assumed there might be some issue with my network, retried the execution, and 10 more times it gave me that error, consuming my entire one-week token limit. It’s not acceptable for the model to generate something and then stop it because of some stupid limit! If there’s going to be a limit, the model shouldn’t output that much; instead of generating, removing from our quotas, and then discarding it!
Thanks for flagging this and providing the details. To assist in troubleshooting, could you clarify which model was selected and provide the size of the input context?
This is unacceptable, we are paying for the user of our tokens, many agents now use compact conversation for handling this issue.
Im 4 prompts in on several projects and theyve all hit this issue, contextually speaking I actually ran 2 of these projects with System Instruction that actually infer using less tokens, by reducing bulk / spam generations (smoke tests, walls of .mds, fallback data and nonsense naming/code comments).
Even with said system instructions, the context still fails after few prompts.
As another user has pointed out , it doesnt just stop…it generates, throws the error, burns our tokens and breaks the project. I expect a full refund or partial compensation at the very least for the last month.
Global rates and context failings are getting ridiculous, I paid to use what you offer and I do not expect to be rated because youre cheaping out on server / hardware costs.
Rate limits and context caps during chats are NOT the users hurdle, thats yours, either provide us with a model that sits within your capabilities or charge us less considering you cant honour end to end service without extensive issues and errors.
Ive got a good 20 projects that have burnt through credits and have not produced anything.
Using Sonnet 4.6. Why is this even happening? This should be handled by the agent. I have never run into this issue while using Cursor or even GitHub Copilot.
I’m constantly getting this error.. and tokens get burnt it’s like a joke, but a bad one, because this is not even funny… who came up with such system where it constantly exceed the max output token limit and burns tokens while doing so.. it feels like a …