Why always getting Status 429? Very frustrating

6 days ago, I have no problem sending API requests to Gemini 1.5 Pro via the Python SDK. Today, I can’t send a request to save my neck! Always Status 429 resource exhausted.

Tried:

  • Created another API key
  • Activating my Workspace Trial (?)
  • Created yet another API key

Nothing works…

While I create the API key, the site shows Free of charge but after I create the API key, I refresh the browser and it shows Paid !! ??

How am I supposed to troubleshoot this?

What your screenshot shows is a free account, not a billed account. The link ‘Set up billing’ is there to change the project status from free to billed if you want to change it.

The other link right below that, ‘View usage data’, can be used to monitor API usage to some extent.

Hope that helps.

1 Like

Thank you for the reply.

The account is paid. While creating the API key, it says Free of charge. But after the key has been created, I refresh the browser and it says Paid.

did you enable retry in your python code?

I never see a 429 with it enabled and im also using the free version.

from dotenv import load_dotenv
import google.generativeai as genai
from google.generativeai.types import RequestOptions
from google.api_core import retry
import os
load_dotenv()
print(os.getenv("API_KEY"))
genai.configure(api_key=os.getenv("API_KEY"))
model = genai.GenerativeModel('gemini-1.5-pro-latest')
response = model.generate_content("What is the meaning of life in two sentences",
                                  request_options=RequestOptions(retry=retry.Retry(initial=10, multiplier=2, maximum=60, timeout=300)))
1 Like

I’ll try this now. What is the pip install for google.api_core? Thank you.

(Disregard: pip install google-api-core worked.)

Hm…I’m using .send_message vs. .generate_content. Does this matter?

def submit_gemini_query(api_key, system_message, user_message):
    
    genai.configure(api_key=api_key)

    safety_settings = [ 
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}]

    generation_config = {
        "temperature": 0,
        "max_output_tokens": 8192
    }
    
    model = genai.GenerativeModel(
        model_name="gemini-1.5-pro-latest",
        generation_config=generation_config,
        system_instruction=system_message,
        safety_settings=safety_settings
    )

    chat_session = model.start_chat(history=[])

    response = chat_session.send_message(user_message)

    return response.text

Not really, send_message() is a wrapper that maintains history, the underlying API operation is the same with generate_content().

1 Like

Hm…After implementing your retry suggestion…

import google.generativeai as genai
from google.generativeai.types import RequestOptions
from google.api_core import retry

#--------------------------------------------------------

def submit_gemini_query(api_key, system_message, user_message):
    
    genai.configure(api_key=api_key)

    safety_settings = [ 
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"}, 
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}]

    generation_config = {
        "temperature": 0,
        "max_output_tokens": 8192
    }
    
    model = genai.GenerativeModel(
        model_name="gemini-1.5-pro-latest",
        generation_config=generation_config,
        system_instruction=system_message,
        safety_settings=safety_settings
    )

    chat_session = model.start_chat(history=[])

    response = chat_session.send_message(user_message,
                                         request_options=RequestOptions(
                                             retry=retry.Retry(
                                                 initial=10, 
                                                 multiplier=2, 
                                                 maximum=60, 
                                                 timeout=300
                                             )
                                         )
                                        )

    return response.text

I receive:

### Error: Timeout of 300.0s exceeded, last exception: 503 The model is overloaded. Please try again later.

well that’s not something you can fix. To many people are using it at the same time.

Hm…I don’t buy the HTTP responses from Gemini. I was getting all 429 codes up until I made your suggested retry changes, then the code switched to 503. Spaced at 10min intervals…is Gemini typically that unavailable?

Yep…just tested the API call again this morning…same 503 message. Smells like an erroneous return code.

What the heck!?!

Newbie here … , I asked Gemini … this was its response, dunno if its of any help …

Understanding the “429 Resource Exhausted” Error with Google Gemini

The “429 Resource Exhausted” error typically indicates that you’ve exceeded a rate limit or quota for the Google Gemini API.

Common Causes:

  1. Exceeding Rate Limits:
    • Burst Rate: Sending too many requests within a short period can trigger this error.
    • Sustained Rate: Consistently exceeding the average rate limit over a longer period can also lead to this issue.
  2. Quota Exhaustion:
    • Your project might have reached its overall quota for the Gemini API or specific model usage.
  3. Resource Constraints:
    • Google’s infrastructure might be experiencing temporary overload or maintenance.
  4. API Usage Patterns:
    • Inefficient usage patterns, such as redundant requests or excessive data in requests, can contribute to resource exhaustion.

Troubleshooting Steps:

  1. Check API Documentation:

    • Review the official Gemini API documentation for specific rate limits, quotas, and best practices.
    • Understand the error messages and potential solutions provided in the documentation.
  2. Monitor API Usage:

    • Use the Google Cloud Console or API monitoring tools to track your API usage and identify usage patterns.
    • Analyze request frequency, response times, and error rates.
  3. Implement Rate Limiting:

    • Introduce delays or backoff mechanisms between requests to avoid exceeding rate limits.
    • Use libraries or frameworks that provide built-in rate limiting features.
  4. Optimize Requests:

    • Reduce the size of your requests by removing unnecessary data.
    • Combine multiple requests into a single request if possible.
    • Explore batching or streaming options for large datasets.
  5. Increase Quotas:

    • If you’re consistently hitting your quota, consider requesting an increase through the Google Cloud console or support channels.
  6. Error Handling:

    • Implement proper error handling in your code to gracefully handle the “429” error and retry requests with appropriate delays.

Additional Tips:

  • Exponential Backoff: Implement exponential backoff to gradually increase retry delays after encountering multiple consecutive errors.
  • Jitter: Introduce random delays to avoid synchronized retries from multiple clients.
  • Circuit Breaker: Consider using a circuit breaker pattern to temporarily stop requests when errors occur frequently.
  • Asynchronous Processing: If possible, process requests asynchronously to improve throughput and handle rate limits more effectively.

By following these guidelines and carefully analyzing your API usage patterns, you should be able to effectively address the “429 Resource Exhausted” error and optimize your interactions with the Google Gemini API.

Would you like to provide more details about your specific use case and the code you’re using to interact with the Gemini API? This information could help in providing more tailored recommendations.

1 Like

I don’t think its a Resource Exhaustion issue:
Rationale:

  • Code worked, as-is, last week
    • Only thing changed was my Workspace trial period ended.
    • I Activated my permanent Workspace account, and recreated the API key.
  • I’m only issuing 1 API request per 10-20 mins.
  • Yes, I’m issuing 1M+ tokens, but I have a paid account and I’m using gemini-1.5-pro-latest which has a context window of what, 2M tokens?
    • Also the model card says it can handle something like 350 requests per minute.
    • I’m 349/min. below that.
    • Also 2, when I added retry logic, the status code changed to 503 Model is overloaded.
    • I tested the pipeline over a couple of days…same 503 message.

This is a major blocker at this point.

I will hazard a guess you were getting the 503 before as well. You were getting so many more 429 that they were swamped and not noticeable. The 503 is, as it describes an overloaded backend, also something that can be automatically retried.

The advice is: Use retry logic with exponential back off and deem any of 429, 500, 503 as retry-eligible.

Hope that helps!

1 Like

Just tested again…no code change beyond what’s shown above from a couple days ago…hm…works…cool.

Thank you all for engaging and pitching ideas over the past few days. The retry logic will make this pipeline more robust for sure.

Take care!

And we’re back to 503 errors after 2 successful responses. Must be a retry issue. Going to craft up some custom logic around this.

I ran a test too, 50 requests with 30 sec gap (a sleep) between them. 49 were 200, one was a 503. In my custom retry code, I used to have the test status in [429, 500] and I updated that to status in [429, 500, 503].

1 Like

Care to share your custom retry logic or point to a good example online? I don’t see anything like that in the official cookbook.

My custom solution is in a library written from scratch which uses Python request library for http. No gRPC. So I have status as a value directly. Not applicable to anyone using the Google library.

This has an implementation on top of the Google library: How to implement exponential backoff with gemini? ... - Google Cloud Community

1 Like