Tier 3 Project – Persistent 503 & 429 Errors in Production (No Communication / Need ETA)

Hi team,

We are currently experiencing ongoing issues with the Gemini API on a Tier 3 project (production environment), and we’re looking for clarification and guidance.

Issue Summary

For the past several days, we’ve been seeing recurring:

  • 503 (Service Unavailable)

  • 429 (Too Many Requests)

These errors are happening consistently and are significantly impacting production usage.

Observations

  • Errors started a few days ago and have been persisting without resolution

  • No official communication or incident report found so far

  • The issue appears intermittent but frequent enough to disrupt service

  • Error spikes are clearly visible in usage dashboards (attached screenshots)

  • Occurring even when traffic patterns remain relatively stable

Impact

  • Production degradation

  • Failed requests at scale

  • Unreliable API behavior despite being within expected usage patterns

Questions

  1. Is there an ongoing incident or degradation affecting Gemini API (Tier 3)?

  2. Are these 429s expected (rate limiting changes?) or unintended?

  3. Are the 503 errors related to backend instability or capacity issues?

  4. Is there an ETA for resolution?

  5. Any recommended mitigation strategies on our side?

Additional Context

  • Tier: 3

  • Timeframe: last 7 days (also visible over 28 days trend)

  • Models used: Gemini 2.5 Flash / Flash Lite

Happy to provide more logs or request IDs if needed.

This is a critical production issue, so any visibility would be greatly appreciated.

Thanks in advance.

Error with keep experiencing:

GoogleGenerativeAIFetchError: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent: [503 Service Unavailable] This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.

Can anyone from the Google Team help with this?

We’re facing these errors for over a month now (approx. 30% of our requests fail and we have to retry another 3 or 4 times before it works), with no answer whatsoever until now. I’ve probably answered more than 10 posts and they never replied to me.

Besides that, if you check the Gemini Models in OpenRouter, they all show a very concerning uptime with no status update at the official API status page from google (for both Google AI Studio and for Vertex).

Thanks for the reply.
The issue is still ongoing on our side and is affecting a production Tier 3 project with recurring 503 and 429 errors over several consecutive days.

This does not appear to be isolated transient throttling, as the error spikes are significant and visible directly in the API dashboard despite relatively stable traffic patterns.

Could someone from the Gemini/API infrastructure team please confirm:

  • whether there is an active backend degradation,

  • if rate limiting behavior has recently changed,

  • and whether there is any ETA or mitigation guidance for production customers?

It’s clearly just computing restraints. They don’t have enough GPUs, Electricity, etc. to keep a steady uptime for all customers (and their TPUs probably have a lot of restrictions too).

The problem is the lack of transparency and acting like everything is normal, show wrong uptimes in their official API status page, etc.