Consistently getting 503 Error when using Gemini-2.5-Pro model using Google GenAI SDK

Madhusree_Rana · November 13, 2025, 7:15am

I had written the following code:

import os

import time

from dotenv import load_dotenv

from google import genai

from google.genai import types

from prompts import *

from schemas import GroupBenefit

def extract_parameters(

client,

model,

contents,

config

):

print(f"\nAttempting extraction using model: {model}")

response = client.models.generate_content(

model=model,

contents=contents,

config=config

return response

def print_result(model: str, response: dict, start_time: float):

print(f"\nExtraction successful using model: {model}!")

print(response)

print(“\nTotal time taken: %s seconds” % round((time.time() - start_time), 2))

if _name_ == “_main_”:

start_time = time.time()



load_dotenv()

MAIN_MODEL = os.getenv('MAIN_MODEL')

MINI_MODEL = os.getenv('MINI_MODEL')

TEMPERATURE = float(os.getenv('TEMPERATURE'))



client = genai.Client()



input_file_path = "OutputFiles/Renewal 2025_extracted.txt"

try:

file = client.files.upload(

file=input_file_path,

config={‘mime_type’: ‘text/plain’}

    )



    contents = \[

        get_gb_prompt_for_excel(naic="23"),

file

    \]



    config = types.GenerateContentConfig(

system_instruction=get_system_prompt(),

temperature=TEMPERATURE,

response_mime_type=‘application/json’,

response_schema=GroupBenefit

    )



    model = MAIN_MODEL

    response = extract_parameters(

client=client,

model=model,

contents=contents,

config=config

if hasattr(response, “parsed”) and response.parsed:

        print_result(model, response.parsed.model_dump_json(indent=2), start_time)

except Exception as e:

    err = str(e).lower()

if any(sub in err for sub in [“503”, “unavailable”, “overloaded”]):

print(f"Model ({model}) overloaded. Retrying with another model…")

try:

            model = MINI_MODEL

            response = extract_parameters(

client=client,

model=model,

contents=contents,

config=config

if hasattr(response, “parsed”) and response.parsed:

                print_result(model, response.parsed.model_dump_json(indent=2), start_time)

except Exception as e:

print(“\nExtraction failed after all retries!”)

print(str(e))

raise

else:

print(err)

raise

====================================================================

The main and mini models are as follows:

MAIN_MODEL=gemini-2.5-pro
MINI_MODEL=gemini-2.5-flash

Whenever I am running the code, I am almost always getting the following error for “gemini-2.5-pro”, and sometimes for “gemini-2.5-flash”:

503 UNAVAILABLE. {‘error’: {‘code’: 503, ‘message’: ‘The model is overloaded. Please try again later.’, ‘status’: ‘UNAVAILABLE’}}

I am using the free tier of the API.

Can you please resolve this issue at the earliest?

Cyberknight · November 13, 2025, 9:17am

i am having the same issue on my end

Gene13 · November 13, 2025, 5:50pm

same here over the last few days

James_McCabe · November 13, 2025, 5:55pm

Yes same issue here today using either 2.5 pro or flash. Model overload. Changed my location env. variables from us-central1 to us-east4. Worked for a bit and then same errors.

Pranav_Singhania · November 14, 2025, 3:32am

We are observing the same behaviour across our apps, and Google’s page does not acknowledge any degradation of service surprisingly

Rob_O · November 15, 2025, 5:52pm

Exactly the same here. New customer, not on the free tier. Heres the situation yesterday. Currently I couldn’t go anywhere near this for production. Simpler prompts get through 2/3 times, the more involved ones take 5-10 refreshes before it finally gets through. All 503’s.

Pooja_Kapse · December 16, 2025, 3:50pm

Hi @Madhusree_Rana,

Please let me know, if you are still facing this issue?

dhnation · January 28, 2026, 4:09pm

yes, using model gemini-3-pro-image-preview, especially for 4K requests.

John_B1 · April 10, 2026, 5:33pm

This service has been unreliable and unsuitable for business use. I was evaluating Google AI as a fallback option for our benchmarking workflow, and out of only 20 test requests, around 40% failed with 503 errors. The same requests work on other models through vLLM or SGLang in under 5 seconds.

I tested multiple models in the Gemini API Console and AI Studio and encountered the same issue. I have also opened two separate support incidents.

Because of the repeated failures, I was requesting a full refund of all credits and usage charges. The current level of reliability is not acceptable for our business, and we are moving away from the platform.

Topic		Replies	Views
[503] This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later Gemini API model	35	6254	June 30, 2026
ALL of The Gemini Models Are giving me 503 Error Gemini API ai-studio , api , models	11	1490	January 23, 2026
Gemini 2.5 pro 503 error Gemini API ai-studio , api , gemini , model , gemini-2-5	3	873	November 19, 2025
Frequent 503 "The model is overloaded" errors on Gemini 2.5 Flash Gemini API model , gemini-flash-2-5	20	3107	April 8, 2026
Error: The model is overloaded Gemini API model	63	42401	January 28, 2026

Consistently getting 503 Error when using Gemini-2.5-Pro model using Google GenAI SDK

Related topics