I am encountering an issue with the Gemini Flash 1.5 API, where I receive a 429 (Too Many Requests) error after making only 2-4 requests, despite the documented rate limit being 2000 requests per minute (RPM) for my paid plan.
I have already attempted adding delays between requests, yet the issue persists. Based on the API documentation, I should be well within the allowed rate limits, but the system is still throttling my requests unexpectedly.
Could you please help me understand:
- Whether there are any undocumented limitations or restrictions that might be causing this?
- If my account or API key is being incorrectly rate-limited?
- Any recommended troubleshooting steps to resolve this issue?
3 Likes
i’m having the same issue. can’t find any solution. on paid tier using gemini-2.0-flash
Yup, same issue on my side. Trying to process documents, at a rate of 1 small 1-page PDF document every 5 minutes, and after 3 documents I get 429 error. Also trying to use gemini-2.0-flash
I’m on a paid GCP account, with a billing profile that has many other GCP projects with paid-for resources that all work perfectly fine.
Same issue 
I’m on a paid plan but always getting 429 Too Many Requests error
1 Like
Hi all,
Are you still facing 429 errors?
Please follow the below instructions to troubleshoot:
Go to GCP console and click “APIs & Services”. Under Metric, search and select “Generative Language API”.. Under “Quotas & System Limits” tab, check for “Current Usage percentage”..
If it reaches 100%, then you have reached your quota limits and hence the 429 Error.
If you think that there is any discrepancy, please DM me with a clear error message and Project ID to help us investigate further.
using vertex AI. I am most certainly below 100% and I am stuck at 5 req per min.
I hit that limit very fast with embedding (very small token input)
Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: gemini-embedding
pid: crm-system-8e12d
Hi @Halid_Rian,
Got you.. Thanks for sharing your PID.. I have escalated your issue to Engineering team.. We will investigate this issue soon.
Appreciate your patience!