What's the rate limit for the experimental models?

Hi,

I’m using the latest experimental models(gemini-1.5-pro-exp-0827) in an internal beta and i’m getting rate limited after just 3-4 messages. Specifically I’m getting a 429 error.

I’m on the paid plan and I can’t seem to find what the limits are or what i’m doing wrong.

Can anyone help me with the limits or how i can increase them? thank you!

Hi @Karthik_Kannan. Experimental models are free to use and have quotas as per free tier. Higher quota will be available once the models move out of experimental. It is not for production use, because there is no guarantee that an experimental model will become a stable model in the future.
In your case (gemini-1.5-pro-exp-0827) Rate limits are 2 RPM, 32,000 TPM, 50 RPD. You can refer to below doc :

The error 429 you are getting is because of RESOURCE_EXHAUSTED. Please refer to the below doc :

You can retry the operation that returns the 429 code. For example, this code snippet: Extract structured data using function calling  |  Gemini API  |  Google AI for Developers shows how to use request_options that uses retry with exponential backoff.

Hope that helps!

1 Like

With a rate limit of 50 per day, simply codifying a 30+ second wait between submissions is still enough to exhaust the daily free usage of pro-experimental in a half hour. Upon submit, a “waiting xx seconds until I can send again” chatbot can fit a single-user chat scenario, where any 429 then must indicate the daily limit.

A backoff/retry must be written to be able to abort and save for itself a model tier flag of “TPD hit” instead of persisting against that daily cutoff. Backoff/retry is appropriate for flash’s much higher free rate limits, where more experimental automation is possible.

2 Likes

Are these rate limits per account or per IP? Let’s say I build an experimental research tool for evaluating models, where the users are all AI researchers who provide their own API keys. Will there be an issue with rate limiting if the requests are sent via my server? Or is it better to use an architecture where requests are sent directly from client side code running in the user’s browser on their personal device?

Welcome to the forum.
The rate limits are accounted per project id. In the scenario you are describing, with each researcher using their API key and all drawn from different projects, you can route all the traffic through a single IP address or use separate clients, whichever works best for your research, it will be the same for the backend and the limit accounting.

1 Like