We’re experiencing a dramatic increase in HTTP 429 rate limit errors when using Gemini API through Vertex AI. This issue appeared in the past few days.
Since we’re using the global endpoint with DSQ, we don’t believe this is a straightforward quota limitation issue.
I started seeing a spike in 429s as well with gemini-2.5-flash-image. It sure seems like something changed on Google’s side. The API has become nearly unusable due to the volume of 429s. Does not seem like normal DSQ behavior as I see the same error rate consistently every day for the last few days now.
Help! From one of my servers I am not getting any 429 errors , for 2 servers located on customer premises (different url than mine) I am getting a ton of 429 - tested with same vertex key to eliminate any quota issues. Tried moving to global instead of us-central1 - no change. Anyone found a solution or a tip that works ?
This or provisioned are the only way I can get it to run consistently. Google is pay to play. pay more, get more priority in resources it would seem.
I’ve fougth them for a long time on this. They have other resources like the backoff jitter. When I had a call with them, they told me that if I go to Priority Pay Go, they expect my errors would drop to 0. Long story short I have an intense use case so they didn’t but its much more usable now and seems to work better. Gemini 3.1 is off the table for now. takes a lot longer and I get more errors.