Sudden Spike in 429 Errors with Gemini 2.5 via Vertex AI Global Endpoint

We’re experiencing a dramatic increase in HTTP 429 rate limit errors when using Gemini API through Vertex AI. This issue appeared in the past few days.

Since we’re using the global endpoint with DSQ, we don’t believe this is a straightforward quota limitation issue.

Configuration:

  • Platform: Google Cloud Vertex AI

  • Endpoint: Global region

  • Models: Gemini 2.5 series (text generation)

  • Method: google.cloud.aiplatform.v1beta1.PredictionService.StreamGenerateContent

30-Day Metrics:

  • Total Requests: 156,579

  • Error Rate: 1.01%

  • Average Latency: 10.538 seconds

  • 99th Percentile Latency: 32.764 seconds

Daily error rate trends over the past 30 days

Questions:

  • Could our system prompt characteristics be influencing rate limiting behavior?

  • What could be causing this sudden rate limit increase? Are there known platform issues or any changes?

6 Likes

getting an error rate of almost 100% today

2 Likes

I started seeing a spike in 429s as well with gemini-2.5-flash-image. It sure seems like something changed on Google’s side. The API has become nearly unusable due to the volume of 429s. Does not seem like normal DSQ behavior as I see the same error rate consistently every day for the last few days now.

1 Like

Over 80% of the requests are for the Gemini 2.5 Pro model.