Hey everyone, if I’m posting in the wrong place, forgive me, I’ve been searching for a solution all day.
So I’ve been using, even the basic pre-built Conversational web app in Google AI Studio and suddenly I’m hitting a quota limit. Payment accounts are sorted, billing account is good, and we have credits sorted.
I’m really stumped - using gemini-2.5-flash-preview-native-audio-dialog - contacted Google Billing and they sent me here… Really hope someone can help!
Please go to GCP console and click “APIs & Services”. Under Metric, search and select “Generative Language API”.. Under “Quotas & System Limits” tab, check for “Current Usage percentage”..
If it reaches 100%, then you have reached your quota limits and hence might get 429 Error.
If you think that there is any discrepancy, please DM me with a clear error message and Project ID to help us investigate further.
Hey Krish, so I looked an we’re not exceeding any quotas. There are no tools exceeding quotas, but we have a lot of items like these with 0 marked. I haven’t changed anything in here for the issue to start though, I’m not sure if something might have changed without my knowing.
Hmm.. I don’t see gemini-2.5-flash-preview-native-audio-dialog model listed in the attached screenshot and you have no usage in this account.. Can you reconfirm if you are checking stats from the same account that was used to test native-audio model?
Few days ago, gemini-2.5-flash return quota limit error and google cloud dash board shows 99% usage of token per min limit. However, it was impossible to restrict the API because I had not used the Gemini Flash API for a few days and it is first call of that day (of course my request does not have 1M tokens). After some time, the error disappeared naturally, but I think this is clearly an API server error.
It seems that gemini-2.5 family is still extremely unstable …
We are rapidly evolving our models and had 2 releases within last 10 days.. During release, we’re aware of the server limit issues due to increased load during releases.. May be your usage fell during such time..
However if you face token limit issue again and you believe there is a discrepancy, feel free to DM me with your Project ID to help us troubleshoot.
Hey Krish, sure so. It’s the same account in Google AI Studio as Google Cloud console, I can’t see any differences. I’ve requested a quota uplift so I’ll speak to Sales people to see how they can help.
Thanks for looking into this, by the sounds of other comments it’s a recurring issue. Hopefully can find a way to fix.
Any resolve on this? I have the EXACT same issue as Original Poster using that pre built one, you go to add your API and it lets you barley use anything. Nothing is out of limits per the console either.
Hi Krish,
I have slightly different issue. I have billing enabled, using it for more then 3 months and exceeded $300 spent. But still struck with Tier 1.
Through IAM quota submitted request to increase limit 3 times and it got rejected.
Any help to promote our account to Tier 2.
1000 RPD for the gemini-2.5-pro model is too low for our use case.
Thanks
Balaji
So from talking with my sales guy at Google, he basically recommended migrating to Vertex AI from Google AI Studio as the quota limits are just too low regardless of tier - this is because it’s a free service at point of access (like a teaser before you jump in and pay to play).
So I’m now enjoying the Google Cloud puzzle game where I’m breaking things repeatedly and wrapping my head around documentation. The shame for me is that we’re using the Voice AI for a bidirectional conversation, which there seems to be little information on setting up and such.
I hope there can be relevant documentation simply laid out to solve these increasingly used-by-noobs services soon.
Cheers & hope you’re all finding some useful info out.
You are right.. Vertex AI is designed to build production quality services and have much more flexibility w.r.t customizing models as well.. I’m glad you got this info and heading in the right path.
Can you be specific on use-cases that you are trying to implement and areas where you feel documentation needs to be improved?
I recently forked the AI Studio demo for the Gemini Live API and noticed some inconsistencies with the quota limits. According to the “Quotas and Limits” page, the free tier for the Gemini API is listed as 5 requests per minute (RPM). However, in my AI Studio dashboard, it shows a limit of 50 RPM, and my account is marked as Tier 1. This discrepancy is confusing, and I’d appreciate clarification on how these limits are applied (e.g., per API key, per project, or per account). Screenshots of my dashboard and the limits page are attached for reference.
While the Gemini Live API is impressive, the current rate limits feel restrictive for prototyping and development. For comparison, other LLM providers offer simpler API access with higher limits, making it easier to build and test applications quickly. The process of navigating tiers and quotas in Google Cloud feels like an unnecessary hurdle, detracting from an otherwise excellent product.
Could someone from the Google team clarify the following:
Why does my AI Studio show 50 RPM and Tier 1 while the documentation indicates 5 RPM for the free tier? Is this a bug or an intentional difference?
What are the steps to request a quota increase for the Gemini Live API to support more robust prototyping? I would like to be on Tier 3.
Does Vercel AI provide access to the Gemini Live API with higher limits, or is it subject to the same restrictions as Google’s platform?
When do these limits rest ?
It would be incredibly helpful if Google could streamline the process for accessing and scaling API limits to make development more seamless. Any guidance on how to resolve these issues and get back to building would be greatly appreciated.
I have demo to show with Gemini Live API on monday but now im worried i cant improve the product in fier of getting iced out of actually showing the demo to my audience.