Consistently getting 503 Error when using Gemini-2.5-Pro model using Google GenAI SDK

I had written the following code:

import os

import time

from dotenv import load_dotenv

from google import genai

from google.genai import types

from prompts import *

from schemas import GroupBenefit

def extract_parameters(

client,

model,

contents,

config

):

print(f"\nAttempting extraction using model: {model}")

response = client.models.generate_content(

model=model,

contents=contents,

config=config

)

return response

def print_result(model: str, response: dict, start_time: float):

print(f"\nExtraction successful using model: {model}!")

print(response)

print(ā€œ\nTotal time taken: %s secondsā€ % round((time.time() - start_time), 2))

if _name_ == ā€œ_main_ā€:

start_time = time.time()



load_dotenv()

MAIN_MODEL = os.getenv('MAIN_MODEL')

MINI_MODEL = os.getenv('MINI_MODEL')

TEMPERATURE = float(os.getenv('TEMPERATURE'))



client = genai.Client()



input_file_path = "OutputFiles/Renewal 2025_extracted.txt"

try:

file = client.files.upload(

file=input_file_path,

config={ā€˜mime_type’: ā€˜text/plain’}

    )



    contents = \[

        get_gb_prompt_for_excel(naic="23"),

file

    \]



    config = types.GenerateContentConfig(

system_instruction=get_system_prompt(),

temperature=TEMPERATURE,

response_mime_type=ā€˜application/json’,

response_schema=GroupBenefit

    )



    model = MAIN_MODEL

    response = extract_parameters(

client=client,

model=model,

contents=contents,

config=config

    )

if hasattr(response, ā€œparsedā€) and response.parsed:

        print_result(model, response.parsed.model_dump_json(indent=2), start_time)

except Exception as e:

    err = str(e).lower()

if any(sub in err for sub in [ā€œ503ā€, ā€œunavailableā€, ā€œoverloadedā€]):

print(f"Model ({model}) overloaded. Retrying with another model…")

try:

            model = MINI_MODEL

            response = extract_parameters(

client=client,

model=model,

contents=contents,

config=config

            )

if hasattr(response, ā€œparsedā€) and response.parsed:

                print_result(model, response.parsed.model_dump_json(indent=2), start_time)

except Exception as e:

print(ā€œ\nExtraction failed after all retries!ā€)

print(str(e))

raise

else:

print(err)

raise

====================================================================

The main and mini models are as follows:

MAIN_MODEL=gemini-2.5-pro
MINI_MODEL=gemini-2.5-flash

Whenever I am running the code, I am almost always getting the following error for ā€œgemini-2.5-proā€, and sometimes for ā€œgemini-2.5-flashā€:

503 UNAVAILABLE. {ā€˜error’: {ā€˜code’: 503, ā€˜message’: ā€˜The model is overloaded. Please try again later.’, ā€˜status’: ā€˜UNAVAILABLE’}}

I am using the free tier of the API.

Can you please resolve this issue at the earliest?

3 Likes

i am having the same issue on my end

same here over the last few days

Yes same issue here today using either 2.5 pro or flash. Model overload. Changed my location env. variables from us-central1 to us-east4. Worked for a bit and then same errors.

We are observing the same behaviour across our apps, and Google’s page does not acknowledge any degradation of service surprisingly

Exactly the same here. New customer, not on the free tier. Heres the situation yesterday. Currently I couldn’t go anywhere near this for production. Simpler prompts get through 2/3 times, the more involved ones take 5-10 refreshes before it finally gets through. All 503’s.