429 Errors on Large Prompt

I’m attempting to use Gemini 1.5 pro with a large prompt.

I believe I’m using a paid account (see the attached image).

The prompt is large (400k characters), but should be well under the 2M context window.

Whenever I call the API using Gemini 1.5 Pro (latest, 001), I get this error:

[GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent: [429 Too Many Requests] Resource has been exhausted (e.g. check quota).

When I run the exact same call with 1.5 Flash, it works as intended.

I can’t see any quota issues in my Google Cloud Console.

Adding more information for context:

  • I’m just doing test calls right now 2-3 calls per hour
  • I get this error for every call to 1.5 Pro, even with a 24 hour delay between calls

How can I get more information here?

1 Like

Hi @James_Dillard

Welcome to the dev forum.

Gemini 1.5 Pro in the pay-as-you-go tier has the following limits:

  • 360 RPM
  • 2 million TPM
  • 10,000 RPD

Exceeding any one of these would give you rate limit errors. I’d recommend retrying with exponential back-off when you receive such errors.

I’m getting this error on the first test call, every single time on Gemini 1.5 Pro Paid.

I’ve waited 24 hours between calls and still the same response…

1 Like

I’d recommend counting tokens to know exactly how much tokens exactly they are being sent to the API and if they exceed the TPM limits.

Additionally since the rate limits are enforced on account level, it better to account for all the instance where API is being consumed.

I believe that its well under the tokens per minute — 400k characters in English should be about 100k tokens, well under the 4m TPM limit. To exceed the 4m TPM limit, each character would have to be 10 tokens, which seems unlikely.

Right now I’m only doing test calls with the API, so these individual calls are the only ones happening in the entire account, so we can rule out account level tokens as the driving issue.

Is there anything additional I need to do to “enable” a paid account? What stands out to me is that 1.5 Flash succeeds and 1.5 Pro fails and the key difference there is the TPM limit of the free tier.

You have basically already answered your question. The pricing page Gemini API Pricing  |  Google for Developers shows the token arrival rate for 1.5 Pro is 32k tokens/min for an account without billing enabled. The more-than-that input tokens you tried to supply will always exceed that, unless you deliberately cut up your input into slices and insert almost one-minute sleep operations between the parts.

The token arrival rate on 1.5 Flash is about 30x more generous which allows for large context without hitting that rate limit.

The behavior in a billed account is similar to the behavior in a free (not billed account) until an actual bill is paid (which means there is a good likelihood the following bill will be paid also). There is a progression in limits observed. It should be possible to get improvement faster than that automatically provided by the payments monitoring system by explicitly contacting sales.

Thanks you @OrangiaNebula! I’ll see if I can contact sales.

Which page are you looking at on the cloud console?

Last I checked - the quota page isn’t actually showing any quota usage at all, so it isn’t very helpful.

1 Like

Having the same issue on my end as well also using the paid account.