Sudden Spike in 429 Errors with Gemini 2.5 via Vertex AI Global Endpoint

yj_s · January 22, 2026, 4:45am

We’re experiencing a dramatic increase in HTTP 429 rate limit errors when using Gemini API through Vertex AI. This issue appeared in the past few days.

Since we’re using the global endpoint with DSQ, we don’t believe this is a straightforward quota limitation issue.

Configuration:

Platform: Google Cloud Vertex AI
Endpoint: Global region
Models: Gemini 2.5 series (text generation)
Method: google.cloud.aiplatform.v1beta1.PredictionService.StreamGenerateContent

30-Day Metrics:

Total Requests: 156,579
Error Rate: 1.01%
Average Latency: 10.538 seconds
99th Percentile Latency: 32.764 seconds

Daily error rate trends over the past 30 days

스크린샷 2026-01-22 오후 1.29.011162×714 19.6 KB

Questions:

Could our system prompt characteristics be influencing rate limiting behavior?
What could be causing this sudden rate limit increase? Are there known platform issues or any changes?

Mohammad_Abbas · January 23, 2026, 3:08pm

getting an error rate of almost 100% today

tylertreat · January 26, 2026, 9:52pm

I started seeing a spike in 429s as well with gemini-2.5-flash-image. It sure seems like something changed on Google’s side. The API has become nearly unusable due to the volume of 429s. Does not seem like normal DSQ behavior as I see the same error rate consistently every day for the last few days now.

yj_s · January 29, 2026, 1:24am

Over 80% of the requests are for the Gemini 2.5 Pro model.

Idea_Making_Technolo · March 18, 2026, 6:41am

Help! From one of my servers I am not getting any 429 errors , for 2 servers located on customer premises (different url than mine) I am getting a ton of 429 - tested with same vertex key to eliminate any quota issues. Tried moving to global instead of us-central1 - no change. Anyone found a solution or a tip that works ?

Brody · March 20, 2026, 8:26pm

This or provisioned are the only way I can get it to run consistently. Google is pay to play. pay more, get more priority in resources it would seem.

I’ve fougth them for a long time on this. They have other resources like the backoff jitter. When I had a call with them, they told me that if I go to Priority Pay Go, they expect my errors would drop to 0. Long story short I have an intense use case so they didn’t but its much more usable now and seems to work better. Gemini 3.1 is off the table for now. takes a lot longer and I get more errors.

Topic		Replies	Views
Spike in 429 RESOURCE_EXHAUSTED with v1beta1 StreamGenerateContent (Gemini 3 Flash Preview / Vertex global) — quotas look fine Gemini API gemini-api , live-streaming , gemini-3	3	263	February 18, 2026
Issue with 429 Error on Gemini API Despite Staying Within Rate Limits Gemini API gemini-api	13	1754	March 10, 2026
Gemini-2.5-flash-image: Frequent 429 RESOURCE_EXHAUSTED during sequential image generation - seeking clarity on rate limits Gemini API gemini-api , vertex-ai , rate-limits , image-generation	7	595	February 10, 2026
Unexpected 429 Errors on Vertex AI (Gemini 2.0/2.5 Flash Lite) via Firebase Functions despite <0.1% quota usage Gemini API bug , vertexai , gemini , vertex-ai , firebase	1	200	February 23, 2026
Urgent help with 429 gemini-3-pro-image-preview Gemini API vertexai , gemini-3	7	462	January 27, 2026

Sudden Spike in 429 Errors with Gemini 2.5 via Vertex AI Global Endpoint

Related topics