RESOURCE_EXHAUSTED when use gemini-1.5-pro-002

When I try to use gemini-1.5-pro-002 with promt where more than 20 thousand tokens are used via the api using Curl/Python (in both cases the error is the same), it throws the error RESOURCE_EXHAUSTED, even if the limits are not reached.
But when I use gemini-1.5-pro-exp-0827, everything works.
gemini-1.5-pro-latest and gemini-1.5-pro also do not work

In some cases, this happens with 0827 too. Once I uploaded a file with the size of 1M tokens AI studio returned the error “You are reaching your limit”. I have no idea why this is happening. Maybe too many tokens - bad, idk…

Also wth are you writing in your prompt? The book? XD

Welcome to the forums!

Can you elaborate on what, exactly, you mean by this, and how you’re determining if you’ve reached “the limits” or not?

It isn’t unusual to encounter error 429 on occasion, and you should implement an incremental backoff for such cases. If you’re routinely hitting this every request, there may be other issues we shoudl look into.

Thank you, I mean I haven’t used this api key for more than 24 hours, and when I tried again I waited for more than 1 minute

Yeah, the request itself and its past responses, so that he remembers how to respond, with experimental models I used up to 1.5 million tokens and everything works

Here is an example, I send a request via Curl to the gemini-1.5-pro-002 model, and almost immediately I get the error 429 RESOURCE_EXHAUSTED

Url: https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-002:generateContent?key=ThereWasApi
Status: 429

vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
content-type: application/json; charset=UTF-8
date: Mon, 30 Sep 2024 15:28:04 GMT
server: scaffolding on HTTPServer2
cache-control: private
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=981
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

Then, after a couple of seconds, I send exactly the same request, but for the gemini-1.5-pro-exp-0827 model, and after a while I get the answer I need.

Url: https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-exp-0827:generateContent?key=ThereWasApi
Status: 200

content-type: application/json; charset=UTF-8
vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
date: Mon, 30 Sep 2024 15:29:46 GMT
server: scaffolding on HTTPServer2
cache-control: private
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=62146
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "Response"
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 37669,
    "candidatesTokenCount": 1767,
    "totalTokenCount": 39436
  }
}

Here are two of these queries

Can you take a look at the Quotas & System Limits page for your project and see if it reports what your quota usage has been?

Also seeing the “Traffic by Response Code” graph on the Metrics page might give some insight as well.

Check the quotas indeed - I found out that e.g. gemini-1.5-flash-8b-exp... has a limit of 15 rpm for a project. Because it’s experimental, G didn’t put much power to it I guess.
The 1.5-pro-exp has a limit of 2 rpm (!)

The 1.5-flash has 2000 but it’s misleading because there is another quota, rpm per region that limits it down to 1500.

Always found these quotas difficult to follow, and no tool to help list all relevant quotas for a given API call.

I followed these links, and noticed that for some reason at first I have all limits of 0


I don’t understand why

Url: https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-002:generateContent?key=MyApi
Status: 429

vary: X-Origin
vary: Referer
vary: Origin,Accept-Encoding
content-type: application/json; charset=UTF-8
date: Wed, 02 Oct 2024 02:40:54 GMT
server: scaffolding on HTTPServer2
cache-control: private
x-xss-protection: 0
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
server-timing: gfet4t7; dur=1254
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
accept-ranges: none

{
  "error": {
    "code": 429,
    "message": "Quota exceeded for quota metric 'Generate Content API requests per minute' and limit 'GenerateContent request limit per minute for a region' of service 'generativelanguage.googleapis.com' for consumer 'project_number:1062870507181'.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "RATE_LIMIT_EXCEEDED",
        "domain": "googleapis.com",
        "metadata": {
          "service": "generativelanguage.googleapis.com",
          "consumer": "projects/1062870507181",
          "quota_limit": "GenerateContentRequestsPerMinutePerProjectPerRegion",
          "quota_metric": "generativelanguage.googleapis.com/generate_content_requests",
          "quota_location": "us-east2",
          "quota_limit_value": "0"
        }
      },
      {
        "@type": "type.googleapis.com/google.rpc.Help",
        "links": [
          {
            "description": "Request a higher quota limit.",
            "url": "https://cloud.google.com/docs/quotas/help/request_increase"
          }
        ]
      }
    ]
  }
}