Just 5 requests per minute is very little, will this increase in the near future?
Hi Ana, thanks for your post!
Which model are you referring to? Gemini 1.5 Pro has 2RPM at the moment and Gemini 1.0 Pro has 15.
We have a pay-as-you-go option launching very soon that provides higher RPMs
I am a bit confused about the rate limits too, as far as I can see the limits for the latest Gemini models is:
Free tier: 2 RPM/50 RPD
Pay as you go: 10 RPM/ 2000 RPD
The same site also states that you can request higher rate limits, but by how much?
It’s also mentions that I can migrate to the Vertex AI platform on Google Cloud which may offer higher limits. But according to the following table, the rate limits are the same:
Features | Google AI Gemini API | Google Cloud Vertex AI Gemini API |
---|---|---|
Latest Gemini models | Gemini Pro and Gemini Ultra | Gemini Pro and Gemini Ultra |
Sign up | Google account | Google Cloud account (with terms agreement and billing) |
Authentication | API key | Google Cloud service account |
User interface playground | Google AI Studio | Vertex AI Studio |
API & SDK | Python, Node.js, Android (Kotlin/Java), Swift, Go | SDK supports Python, Node.js, Java, Go |
Free tier | Yes | $300 Google Cloud credit for new users |
Quota (Request per minute) | 60 (can request increase) | Increase upon request (default: 60) |
Enterprise support | No | Customer encryption key, Virtual private cloud, Data residency, Access transparency, Scalable infrastructure for application hosting, Databases and data storage |
MLOps | No | Full MLOps on Vertex AI (Examples: model evaluation, Model Monitoring, Model Registry) |
(Ref: link)
If I’m not mistaken, until yesterday it was even less, it was 5 RPM on pay-as-you-go.
Another thing, the deadline for releasing pay-as-you-go access would be May 2nd, now it is May 14th.
In a production application this limit would certainly be a problem.
I’m planning to use Pay-as-You-Go but 10 RPM would not be enough for my application, because for each user request I need to make 2 requests to Gemini to refine the response. This would support a maximum of 5 simultaneous users.