How Vertex AI rate limits are calculated on GCP?

diegol116 · October 28, 2024, 4:46am

I’m planning to use Google Cloud Platform’s Vertex AI for a few projects. So, I was looking through the documentation in the section on rate limits and I came across this:

But I haven’t found any information anywhere about the algorithm that sets these limits. That is, I have two scenarios in my mind:

First scenario: The limits are at fixed times. For example, between 08:00:00 AM and 08:00:59 AM there are 4 million tokens available and at 08:01:00 AM the tokens are reset.
Second scenario: The limits move as requests are made.

Or maybe it’s different from the scenarios outlined.

I would appreciate if someone could explain to me how Google calculates it, or if there is a section of the documentation where I can find this since I haven’t seen it.

Govind_Keshari · October 28, 2024, 12:59pm

Hi @diegol116, Welcome to the forum !

You can refer to this doc for pricing. Also, you can use Google Cloud’s pricing calculator to estimate the cost.

diegol116 · October 28, 2024, 1:26pm

Thank you very much for your reply @Govind_Keshari. The resource is very useful, however, I need the details of the internal rate limit algorithm, for a specific implementation of my company. I have not been able to find the information

afirstenberg · October 28, 2024, 4:00pm

Your first method is correct. Rate limits are bounded between fixed times. In the case of a minute-based rate, it is bounded between the clock start and end of a minute. In the case of a daily rate, it is bounded by the clock start and end of a day (ie - midnight) in Mountain View, CA, USA.

diegol116 · October 29, 2024, 3:42am

I wish it were like that… but after doing a lot of tests, it doesn’t seem to reset the total amount of tokens exactly every minute. I really don’t know how it works.

afirstenberg · October 29, 2024, 7:49am

How are you testing it?

The console has a bit of a lag when showing usage. (Last time I tested it, it was about 10-15 minutes.)

diegol116 · October 31, 2024, 2:03am

I am using directly the Go SDK from vertex. It does not reset tokens at exact times. Unfortunately I did not find any clear pattern to be able to determine which algorithm is used for the rate limit.

Topic		Replies	Views
Rate Limit Increase Gemini API gemini	4	147	June 6, 2025
5 RPM - Will that be increased in future? Gemini API	4	237	May 2, 2024
Gemini-1.5-pro-002 quotas lower than 001 Gemini API gemini-15 , vertexai	7	1319	November 19, 2024
Understanding API Rate Limits with Gemini - "Sliding Window" vs. Calendar Minute Gemini API api	2	251	December 16, 2024
Penalty for reaching quota in pay-as-you-go with a fine-tuned model? Gemini API fine-tuning , model	2	76	October 27, 2024

How Vertex AI rate limits are calculated on GCP?

Related topics