Gemini 503 issue

Dhruv_Kabra · April 13, 2026, 2:08pm

Despite being on Tier 3, we are consistently hitting 503 UNAVAILABLE errors that persist for hours, not minutes. Our retry strategy (4
retries with equal-jitter exponential backoff up to 120s) is being fully exhausted, and the google-genai SDK’s internal retry (~31s) is
also failing.
Questions:

Is 503 UNAVAILABLE subject to different handling than 429 RESOURCE_EXHAUSTED? Our understanding is 429 is a true rate limit and 503
is server-side capacity — but the Tier 3 docs don’t distinguish between them. Should we expect 503s even within documented rate limits?
Is there a recommended maximum concurrent request count for Tier 3 that isn’t reflected in the RPM/TPM limits? We’re within RPM but
maybe hitting a per-second or per-connection limit that isn’t documented.
For batch workloads (100-500 documents), is the Vertex AI Batch API the recommended path? We see that it’s “not subject to real-time
rate limiting” — would this eliminate the 503s entirely?
Are there any request headers or parameters we can set to signal batch/low-priority traffic that would be handled more gracefully
during capacity spikes?

Siddharth_Naik · April 21, 2026, 9:01am

Hello @Dhruv_Kabra,
503 (“Service Unavailable”) errors are unrelated to your quota. Instead, they indicate that our services are temporarily at capacity and cannot process your request at that moment. You might notice that these errors are more common during certain times of the day.

A 429 error means you have hit your account quota (RPM/TPM), whereas a 503 error means the servers are at capacity. Because Tier 3 relies on shared (rather than dedicated) hardware, you can still encounter 503 errors within your rate limits during massive regional demand spikes.
While there is no hard limit on concurrent requests, sending instant, massive bursts can overwhelm the API, causing it to return a 503 error. Pacing your outbound requests is generally a safer approach than relying solely on retries.
The Batch API is an excellent choice for large workloads, as it is designed to process high volumes of requests asynchronously at a reduced cost. Because the Batch API queues your jobs, it helps avoid 503 errors entirely. Please note that while the target turnaround time is 24 hours, jobs are completed much quicker in the majority of cases.
Ultimately, the Batch API is our recommended method for handling bulk, low-priority, or asynchronous work.

Topic		Replies	Views
Frequent 503 errors with Gemini-3-Flash-preview (~50% failure rate) Gemini API api , gemini-3	1	276	April 3, 2026
Gemini batch API 出现503错误，模型过载 Gemini API api , gemini	2	122	January 5, 2026
Reducing “Service Unavailable” (503) errors with Gemini API – any enterprise options? Gemini API api , gemini	7	379	April 16, 2026
Tier 3 Project – Persistent 503 & 429 Errors in Production (No Communication / Need ETA) Gemini API api	9	339	May 19, 2026
ALL of The Gemini Models Are giving me 503 Error Gemini API ai-studio , api , models	11	1521	January 23, 2026

Gemini 503 issue

Related topics