I'm paid, but 503 Error: The model is overloaded. Please try again later

PangMoo · September 19, 2025, 4:54am

I am currently using the Gemini 2.5 Pro API on a paid basis.
Since it is still pre-release, only a few test users are using it, but I frequently encounter

{“code”:503,“message”:“The model is overloaded. Please try again later.”,‘status’:"UNAVAILABLE”}

Reviewing Sentry logs shows this error occurred over 10 times within a single week.

My API request volume and token usage are not excessive, and I am not exceeding tier-specific limits.

I’ve noticed many others seem to be experiencing this issue. Has anyone found a solution?
We need to launch soon, but this is happening frequently and is serious.

Krish_Varnakavi1 · September 19, 2025, 8:58pm

Hi @PangMoo,

Welcome to the Google AI Forum!

If you are building an application, do implement retry with exponential-backoff.

Attaching my previous post for reference.

PangMoo · September 20, 2025, 6:33am

It may serve as a temporary solution, but checking the error logs reveals that the user attempted multiple times over several hours without success.

This does not appear to be an issue that can be resolved simply by retrying.

Anyway, I’ll try adding a retry feature. Thank you.

Aciax_Hls · September 21, 2025, 7:19am

Why it Works
Exponential back-off is a crucial part of this strategy. Instead of retrying immediately, it waits for a short period before the next attempt, and this waiting period increases exponentially with each failed attempt. This prevents your code from overwhelming the API with repeated requests during a service outage, which could make the problem worse. The backoff_factor of 2 you’ve used is a common and effective choice.

Token Limits: You’re also right to mention token limits. While retries can help with API stability, they can’t fix fundamental issues like exceeding the model’s maximum input or output token count. If a prompt is too long, the API will reject it consistently, regardless of how many times you retry. The solution for this specific problem is to either truncate the prompt, summarize it, or split it into smaller chunks before making the API call.

In short, your implementation is a well-engineered way to make your application more resilient to common API-related problems. Happy coding indeed!

Pada Sab, 20 Sep 2025, 14.43, PangMoo via Google AI Developers Forum

Piotr_Jarecki · September 22, 2025, 1:49pm

Thanks this actually is a smart approach and integrated it - hopefully it works but its not easy to integrate in intermediary software if API calls are done through middleman (AI orchestration solution like e.g. flowise or n8n or other like this) - as such it should probably be handled at API provider.
IMO.

Topic		Replies	Views
Every day 503 errors with msg model is overloaded Gemini API api , model	5	348	August 23, 2025
Gemini 2.5 pro 503 error Gemini API ai-studio , api , gemini , model , gemini-2-5	2	378	September 10, 2025
Why always getting Status 429? Very frustrating Gemini API	18	3742	August 10, 2024
Gemini 2.5 Pro: The model is overloaded. Please try again later Gemini API model , gemini_25_pro	14	1494	August 30, 2025
[PARTIALLY SOLVED] Gemini models overloading with token windows of less than 20? Gemini API gemini-15 , api , models	14	2077	November 18, 2024

I'm paid, but 503 Error: The model is overloaded. Please try again later

Related topics