Why always getting Status 429? Very frustrating

hie · August 8, 2024, 5:40am

6 days ago, I have no problem sending API requests to Gemini 1.5 Pro via the Python SDK. Today, I can’t send a request to save my neck! Always Status 429 resource exhausted.

Tried:

Created another API key
Activating my Workspace Trial (?)
Created yet another API key

Nothing works…

While I create the API key, the site shows Free of charge but after I create the API key, I refresh the browser and it shows Paid !! ??

How am I supposed to troubleshoot this?

OrangiaNebula · August 8, 2024, 6:30am

What your screenshot shows is a free account, not a billed account. The link ‘Set up billing’ is there to change the project status from free to billed if you want to change it.

The other link right below that, ‘View usage data’, can be used to monitor API usage to some extent.

Hope that helps.

hie · August 8, 2024, 6:36am

Thank you for the reply.

The account is paid. While creating the API key, it says Free of charge. But after the key has been created, I refresh the browser and it says Paid.

LindaLawton · August 8, 2024, 6:44am

did you enable retry in your python code?

I never see a 429 with it enabled and im also using the free version.

from dotenv import load_dotenv
import google.generativeai as genai
from google.generativeai.types import RequestOptions
from google.api_core import retry
import os
load_dotenv()
print(os.getenv("API_KEY"))
genai.configure(api_key=os.getenv("API_KEY"))
model = genai.GenerativeModel('gemini-1.5-pro-latest')
response = model.generate_content("What is the meaning of life in two sentences",
                                  request_options=RequestOptions(retry=retry.Retry(initial=10, multiplier=2, maximum=60, timeout=300)))

hie · August 8, 2024, 6:55am

I’ll try this now. What is the pip install for google.api_core? Thank you.

(Disregard: pip install google-api-core worked.)

hie · August 8, 2024, 7:03am

Hm…I’m using .send_message vs. .generate_content. Does this matter?

def submit_gemini_query(api_key, system_message, user_message):
    
    genai.configure(api_key=api_key)

    safety_settings = [ 
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}]

    generation_config = {
        "temperature": 0,
        "max_output_tokens": 8192
    }
    
    model = genai.GenerativeModel(
        model_name="gemini-1.5-pro-latest",
        generation_config=generation_config,
        system_instruction=system_message,
        safety_settings=safety_settings
    )

    chat_session = model.start_chat(history=[])

    response = chat_session.send_message(user_message)

    return response.text

OrangiaNebula · August 8, 2024, 7:09am

Not really, send_message() is a wrapper that maintains history, the underlying API operation is the same with generate_content().

hie · August 8, 2024, 7:17am

Hm…After implementing your retry suggestion…

import google.generativeai as genai
from google.generativeai.types import RequestOptions
from google.api_core import retry

#--------------------------------------------------------

def submit_gemini_query(api_key, system_message, user_message):
    
    genai.configure(api_key=api_key)

    safety_settings = [ 
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}]

    generation_config = {
        "temperature": 0,
        "max_output_tokens": 8192
    }
    
    model = genai.GenerativeModel(
        model_name="gemini-1.5-pro-latest",
        generation_config=generation_config,
        system_instruction=system_message,
        safety_settings=safety_settings
    )

    chat_session = model.start_chat(history=[])

    response = chat_session.send_message(user_message,
                                         request_options=RequestOptions(
                                             retry=retry.Retry(
                                                 initial=10, 
                                                 multiplier=2, 
                                                 maximum=60, 
                                                 timeout=300
                                             )
                                         )
                                        )

    return response.text

I receive:

### Error: Timeout of 300.0s exceeded, last exception: 503 The model is overloaded. Please try again later.

LindaLawton · August 8, 2024, 11:52am

well that’s not something you can fix. To many people are using it at the same time.

hie · August 8, 2024, 2:26pm

Hm…I don’t buy the HTTP responses from Gemini. I was getting all 429 codes up until I made your suggested retry changes, then the code switched to 503. Spaced at 10min intervals…is Gemini typically that unavailable?

hie · August 8, 2024, 2:38pm

Yep…just tested the API call again this morning…same 503 message. Smells like an erroneous return code.

What the heck!?!

glenn_stevenson · August 8, 2024, 5:39pm

Newbie here … , I asked Gemini … this was its response, dunno if its of any help …

Understanding the “429 Resource Exhausted” Error with Google Gemini

The “429 Resource Exhausted” error typically indicates that you’ve exceeded a rate limit or quota for the Google Gemini API.

Common Causes:

Exceeding Rate Limits:
- Burst Rate: Sending too many requests within a short period can trigger this error.
- Sustained Rate: Consistently exceeding the average rate limit over a longer period can also lead to this issue.
Quota Exhaustion:
- Your project might have reached its overall quota for the Gemini API or specific model usage.
Resource Constraints:
- Google’s infrastructure might be experiencing temporary overload or maintenance.
API Usage Patterns:
- Inefficient usage patterns, such as redundant requests or excessive data in requests, can contribute to resource exhaustion.

Troubleshooting Steps:

Check API Documentation:
- Review the official Gemini API documentation for specific rate limits, quotas, and best practices.
- Understand the error messages and potential solutions provided in the documentation.
Monitor API Usage:
- Use the Google Cloud Console or API monitoring tools to track your API usage and identify usage patterns.
- Analyze request frequency, response times, and error rates.
Implement Rate Limiting:
- Introduce delays or backoff mechanisms between requests to avoid exceeding rate limits.
- Use libraries or frameworks that provide built-in rate limiting features.
Optimize Requests:
- Reduce the size of your requests by removing unnecessary data.
- Combine multiple requests into a single request if possible.
- Explore batching or streaming options for large datasets.
Increase Quotas:
- If you’re consistently hitting your quota, consider requesting an increase through the Google Cloud console or support channels.
Error Handling:
- Implement proper error handling in your code to gracefully handle the “429” error and retry requests with appropriate delays.

Additional Tips:

Exponential Backoff: Implement exponential backoff to gradually increase retry delays after encountering multiple consecutive errors.
Jitter: Introduce random delays to avoid synchronized retries from multiple clients.
Circuit Breaker: Consider using a circuit breaker pattern to temporarily stop requests when errors occur frequently.
Asynchronous Processing: If possible, process requests asynchronously to improve throughput and handle rate limits more effectively.

By following these guidelines and carefully analyzing your API usage patterns, you should be able to effectively address the “429 Resource Exhausted” error and optimize your interactions with the Google Gemini API.

Would you like to provide more details about your specific use case and the code you’re using to interact with the Gemini API? This information could help in providing more tailored recommendations.

hie · August 10, 2024, 1:46am

I don’t think its a Resource Exhaustion issue:
Rationale:

Code worked, as-is, last week
- Only thing changed was my Workspace trial period ended.
- I Activated my permanent Workspace account, and recreated the API key.
I’m only issuing 1 API request per 10-20 mins.
Yes, I’m issuing 1M+ tokens, but I have a paid account and I’m using gemini-1.5-pro-latest which has a context window of what, 2M tokens?
- Also the model card says it can handle something like 350 requests per minute.
- I’m 349/min. below that.
- Also 2, when I added retry logic, the status code changed to 503 Model is overloaded.
- I tested the pipeline over a couple of days…same 503 message.

This is a major blocker at this point.

OrangiaNebula · August 10, 2024, 2:03am

I will hazard a guess you were getting the 503 before as well. You were getting so many more 429 that they were swamped and not noticeable. The 503 is, as it describes an overloaded backend, also something that can be automatically retried.

The advice is: Use retry logic with exponential back off and deem any of 429, 500, 503 as retry-eligible.

Hope that helps!

hie · August 10, 2024, 2:20am

Just tested again…no code change beyond what’s shown above from a couple days ago…hm…works…cool.

Thank you all for engaging and pitching ideas over the past few days. The retry logic will make this pipeline more robust for sure.

Take care!

hie · August 10, 2024, 5:14am

And we’re back to 503 errors after 2 successful responses. Must be a retry issue. Going to craft up some custom logic around this.

OrangiaNebula · August 10, 2024, 5:49am

I ran a test too, 50 requests with 30 sec gap (a sleep) between them. 49 were 200, one was a 503. In my custom retry code, I used to have the test status in [429, 500] and I updated that to status in [429, 500, 503].

hie · August 10, 2024, 6:16am

Care to share your custom retry logic or point to a good example online? I don’t see anything like that in the official cookbook.

OrangiaNebula · August 10, 2024, 7:37am

My custom solution is in a library written from scratch which uses Python request library for http. No gRPC. So I have status as a value directly. Not applicable to anyone using the Google library.

This has an implementation on top of the Google library: How to implement exponential backoff with gemini? ... - Google Cloud Community

Topic		Replies	Views
Error 429 is reported after adding Grounding when using the Gemini API Gemini API gemini-api	2	190	May 19, 2025
Getting 429 Errors - But Usage Charts Show no Traffic Gemini API api	54	2599	July 3, 2025
Gemini API Errors Gemini API api	10	499	June 30, 2025
RESOURCE_EXHAUSTED when use gemini-1.5-pro-002 Gemini API gemini-15	8	1082	October 2, 2024
ClientError: got status: 429 Too Many Requests Gemini API api , gemini-2-5	1	99	June 3, 2025

Why always getting Status 429? Very frustrating

Understanding the “429 Resource Exhausted” Error with Google Gemini

Common Causes:

Troubleshooting Steps:

Additional Tips:

Related topics