429 errors despite waiting after `retryDelay`

Hello,

I am trying to handle retries for API requests to the Gemini API using the tenacity. However, I’m encountering an issue, particularly with 429 (rate limit) errors.

My current retry logic is intended to:

  1. Parse the retryDelay value from the APIError details.
  2. Add 5 seconds to this retryDelay.
  3. Combine this with an exponential backoff strategy (wait_exponential).

The waiting logic appears to be functioning correctly, but the 429 errors persist despite multiple retries.

Here is a snippet of my code:

# ... (imports and other setup)

def _is_retriable(e: BaseException) -> bool:
    return isinstance(e, APIError) and e.code in [503, 429]

def _calc_retry_delay(exception: BaseException | None) -> float:
    # ... (logic to parse retryDelay and add 5 seconds)
    if isinstance(exception, APIError) and exception.code == 429:
        try:
            retry_delay = parse(
                [
                    rd
                    for d in exception.details["error"]["details"]
                    if (rd := d.get("retryDelay"))
                ][0]
            )
            if retry_delay:
                return retry_delay + 5
        except (IndexError, KeyError):
            return 60
    return 60

class wait_from_exception(wait_base):
    def __call__(self, retry_state: "RetryCallState") -> float:
        if retry_state.outcome is None:
            return 0
        exception = retry_state.outcome.exception()
        return _calc_retry_delay(exception)

@retry(
    wait=wait_combine(wait_from_exception(), wait_exponential(multiplier=2, min=10, max=300)),
    retry=retry_if_exception(_is_retriable),
    stop=stop_after_attempt(10),
)
async def make_summary(issue_id: str, db: Session):
    # ... (API call)

What could be the reason for this? Am I misunderstanding how to correctly handle the retryDelay from the API response? Any insights or suggestions would be greatly appreciated.

Thank you.

Hi @KobaDev
429 error means you are sending too many requests per minute with the free tier Gemini API.
Verify that you’re within the model’s rate limit. Rate limits  |  Gemini API  |  Google AI for Developers if needed ask for quota increase.

You can also check your quota limit like this
Go to GCP console and click “APIs & Services”. Under Metric, search and select “Generative Language API”.. Under “Quotas & System Limits” tab, check for “Current Usage percentage”..

If it reaches 100%, then you have reached your quota limits and hence the 429 Error.

Hi @Pannaga_J

I received 429 response body like this:

{
   "error":{
      "code":429,
      "message":"You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
      "status":"RESOURCE_EXHAUSTED",
      "details":[
         {
            "@type":"type.googleapis.com/google.rpc.QuotaFailure",
            "violations":[
               {
                  "quotaMetric":"generativelanguage.googleapis.com/generate_content_free_tier_requests",
                  "quotaId":"GenerateRequestsPerDayPerProjectPerModel-FreeTier",
                  "quotaDimensions":{
                     "model":"gemini-2.5-flash",
                     "location":"global"
                  },
                  "quotaValue":"250"
               }
            ]
         },
         {
            "@type":"type.googleapis.com/google.rpc.Help",
            "links":[
               {
                  "description":"Learn more about Gemini API quotas",
                  "url":"https://ai.google.dev/gemini-api/docs/rate-limits"
               }
            ]
         },
         {
            "@type":"type.googleapis.com/google.rpc.RetryInfo",
            "retryDelay":"1s"
         }
      ]
   }
}

You mean this retryDelay is not useful for free tier user?

Based on the error message you provided is a quota exhaustion error, not a temporary rate-limiting error. While both can result in a 429 HTTP status code. This error is telling you that you have exceeded a specific daily quota: GenerateRequestsPerDayPerProjectPerModel-FreeTier, which has a limit of 250 requests. Check this out Rate limits  |  Gemini API  |  Google AI for Developers .
That’s the reason why retryDelay is not working for your usecase.

Thank you for the clarification.

I have two questions/suggestions regarding the API design for 429 errors:

  1. The presence of RetryInfo with a non-applicable retryDelay value is misleading. Could you consider either removing RetryInfo for quota-related errors or providing a meaningful retryDelay value (e.g., the time until the quota resets)?
  2. Is there a way for developers to programmatically distinguish between a temporary rate-limiting error and a quota exhaustion error in the current 429 response body?

Thanks,

Thank you for your suggestions.

Regarding your first point, it’s a good one and will discuss it with our internal team. We’ll update you as soon as possible.

For your second point, please check the JSON response you received. The error message should contain error.status for RESOURCE_EXHAUSTED and error.details with QuotaFailure, Quotavalue and quotaID objects which will help you distinguish .