What's the rate limit for the experimental models?

Karthik_Kannan · September 4, 2024, 9:16pm

Hi,

I’m using the latest experimental models(gemini-1.5-pro-exp-0827) in an internal beta and i’m getting rate limited after just 3-4 messages. Specifically I’m getting a 429 error.

I’m on the paid plan and I can’t seem to find what the limits are or what i’m doing wrong.

Can anyone help me with the limits or how i can increase them? thank you!

Govind_Keshari · September 5, 2024, 2:44pm

Hi @Karthik_Kannan. Experimental models are free to use and have quotas as per free tier. Higher quota will be available once the models move out of experimental. It is not for production use, because there is no guarantee that an experimental model will become a stable model in the future.
In your case (gemini-1.5-pro-exp-0827) Rate limits are 2 RPM, 32,000 TPM, 50 RPD. You can refer to below doc :

The error 429 you are getting is because of RESOURCE_EXHAUSTED. Please refer to the below doc :

OrangiaNebula · September 5, 2024, 5:26pm

You can retry the operation that returns the 429 code. For example, this code snippet: Extract structured data using function calling | Gemini API | Google AI for Developers shows how to use request_options that uses retry with exponential backoff.

Hope that helps!

Jay · September 5, 2024, 8:03pm

With a rate limit of 50 per day, simply codifying a 30+ second wait between submissions is still enough to exhaust the daily free usage of pro-experimental in a half hour. Upon submit, a “waiting xx seconds until I can send again” chatbot can fit a single-user chat scenario, where any 429 then must indicate the daily limit.

A backoff/retry must be written to be able to abort and save for itself a model tier flag of “TPD hit” instead of persisting against that daily cutoff. Backoff/retry is appropriate for flash’s much higher free rate limits, where more experimental automation is possible.

SamRahimi420 · October 31, 2024, 11:04pm

Are these rate limits per account or per IP? Let’s say I build an experimental research tool for evaluating models, where the users are all AI researchers who provide their own API keys. Will there be an issue with rate limiting if the requests are sent via my server? Or is it better to use an architecture where requests are sent directly from client side code running in the user’s browser on their personal device?

OrangiaNebula · November 1, 2024, 1:13am

Welcome to the forum.
The rate limits are accounted per project id. In the scenario you are describing, with each researcher using their API key and all drawn from different projects, you can route all the traffic through a single IP address or use separate clients, whichever works best for your research, it will be the same for the backend and the limit accounting.

Topic		Replies	Views
Why always getting Status 429? Very frustrating Gemini API	18	2987	August 10, 2024
Issue with 429 Error on Gemini API Despite Staying Within Rate Limits Gemini API gemini-api	7	451	June 23, 2025
Encountered 429 Error – Requesting Assistance Gemini API api , help-request	2	131	April 15, 2025
Persistent 429 Errors (Quota Exceeded) for all Gemini Models except 2.5 Flash on Free Tier Gemini API billing , gemini-flash-2-5	3	162	June 10, 2025
Getting 429 Errors - But Usage Charts Show no Traffic Gemini API api	53	2288	June 23, 2025

What's the rate limit for the experimental models?

Related topics