Consistently getting 503 Error when using Gemini-2.5-Pro model using Google GenAI SDK

I had written the following code:

import os

import time

from dotenv import load_dotenv

from google import genai

from google.genai import types

from prompts import *

from schemas import GroupBenefit

def extract_parameters(

client,

model,

contents,

config

):

print(f"\nAttempting extraction using model: {model}")

response = client.models.generate_content(

model=model,

contents=contents,

config=config

)

return response

def print_result(model: str, response: dict, start_time: float):

print(f"\nExtraction successful using model: {model}!")

print(response)

print(ā€œ\nTotal time taken: %s secondsā€ % round((time.time() - start_time), 2))

if _name_ == ā€œ_main_ā€:

start_time = time.time()



load_dotenv()

MAIN_MODEL = os.getenv('MAIN_MODEL')

MINI_MODEL = os.getenv('MINI_MODEL')

TEMPERATURE = float(os.getenv('TEMPERATURE'))



client = genai.Client()



input_file_path = "OutputFiles/Renewal 2025_extracted.txt"

try:

file = client.files.upload(

file=input_file_path,

config={ā€˜mime_type’: ā€˜text/plain’}

    )



    contents = \[

        get_gb_prompt_for_excel(naic="23"),

file

    \]



    config = types.GenerateContentConfig(

system_instruction=get_system_prompt(),

temperature=TEMPERATURE,

response_mime_type=ā€˜application/json’,

response_schema=GroupBenefit

    )



    model = MAIN_MODEL

    response = extract_parameters(

client=client,

model=model,

contents=contents,

config=config

    )

if hasattr(response, ā€œparsedā€) and response.parsed:

        print_result(model, response.parsed.model_dump_json(indent=2), start_time)

except Exception as e:

    err = str(e).lower()

if any(sub in err for sub in [ā€œ503ā€, ā€œunavailableā€, ā€œoverloadedā€]):

print(f"Model ({model}) overloaded. Retrying with another model…")

try:

            model = MINI_MODEL

            response = extract_parameters(

client=client,

model=model,

contents=contents,

config=config

            )

if hasattr(response, ā€œparsedā€) and response.parsed:

                print_result(model, response.parsed.model_dump_json(indent=2), start_time)

except Exception as e:

print(ā€œ\nExtraction failed after all retries!ā€)

print(str(e))

raise

else:

print(err)

raise

====================================================================

The main and mini models are as follows:

MAIN_MODEL=gemini-2.5-pro
MINI_MODEL=gemini-2.5-flash

Whenever I am running the code, I am almost always getting the following error for ā€œgemini-2.5-proā€, and sometimes for ā€œgemini-2.5-flashā€:

503 UNAVAILABLE. {ā€˜error’: {ā€˜code’: 503, ā€˜message’: ā€˜The model is overloaded. Please try again later.’, ā€˜status’: ā€˜UNAVAILABLE’}}

I am using the free tier of the API.

Can you please resolve this issue at the earliest?

i am having the same issue on my end

same here over the last few days

Yes same issue here today using either 2.5 pro or flash. Model overload. Changed my location env. variables from us-central1 to us-east4. Worked for a bit and then same errors.

We are observing the same behaviour across our apps, and Google’s page does not acknowledge any degradation of service surprisingly

Exactly the same here. New customer, not on the free tier. Heres the situation yesterday. Currently I couldn’t go anywhere near this for production. Simpler prompts get through 2/3 times, the more involved ones take 5-10 refreshes before it finally gets through. All 503’s.

Hi @Madhusree_Rana,

Please let me know, if you are still facing this issue?

yes, using model gemini-3-pro-image-preview, especially for 4K requests.

This service has been unreliable and unsuitable for business use. I was evaluating Google AI as a fallback option for our benchmarking workflow, and out of only 20 test requests, around 40% failed with 503 errors. The same requests work on other models through vLLM or SGLang in under 5 seconds.

I tested multiple models in the Gemini API Console and AI Studio and encountered the same issue. I have also opened two separate support incidents.

Because of the repeated failures, I was requesting a full refund of all credits and usage charges. The current level of reliability is not acceptable for our business, and we are moving away from the platform.