As other posts have noted, there seems to be a persistent bug that leads to 429 errors for europe endpoints. In our case, this is for gemini-2.5-flash-lite.
It makes vertex AI extremely unreliable, and despite exponential backoff - we constantly get ‘too many requests’ and ‘resource exhausted’ for periods of a few hours, which then goes away.
Our code is configured to try all the europe endpoints, yet we get this error more or less regardless of where we try.
Are there any SLAs in place for this, and is this a known issue for which a fix is being deployed? Our customers are unhappy with the latency and we will ultimately switch to another provider if this persists.
Same problem here! Keep getting 429 while trying to use Gemini-2.5-flash-lite via Vertex AI on paid tier 3. I am using the same failovers approach as @mkaloer but it doesn’t seem to work.
Any update from the technical team would be appreciated!"
Hi all,
Got this from Google Cloud support. I’ll keep you updated if I hear more.
Hello ,
Thank you for reaching out. I have taken a closer look and it appears that your issue is related to a product outage that has been resolved as of 2026-02-24 05:37 PST. Our team is still working on investigating the root cause of the issue. I will keep you posted once hearing from our team.
ApiError: {“error”:{“code”:503,“message”:“This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.”,“status”:“UNAVAILABLE”}}
It would be nice to have a proper status page which shows true error status and not just the ones they choose to acknowledge