I’m planning to use Google Cloud Platform’s Vertex AI for a few projects. So, I was looking through the documentation in the section on rate limits and I came across this:
But I haven’t found any information anywhere about the algorithm that sets these limits. That is, I have two scenarios in my mind:
- First scenario: The limits are at fixed times. For example, between 08:00:00 AM and 08:00:59 AM there are 4 million tokens available and at 08:01:00 AM the tokens are reset.
- Second scenario: The limits move as requests are made.
Or maybe it’s different from the scenarios outlined.
I would appreciate if someone could explain to me how Google calculates it, or if there is a section of the documentation where I can find this since I haven’t seen it.
Hi @diegol116, Welcome to the forum !
You can refer to this doc for pricing. Also, you can use Google Cloud’s pricing calculator to estimate the cost.
Thank you very much for your reply @Govind_Keshari. The resource is very useful, however, I need the details of the internal rate limit algorithm, for a specific implementation of my company. I have not been able to find the information 
1 Like
Your first method is correct. Rate limits are bounded between fixed times. In the case of a minute-based rate, it is bounded between the clock start and end of a minute. In the case of a daily rate, it is bounded by the clock start and end of a day (ie - midnight) in Mountain View, CA, USA.
2 Likes
I wish it were like that… but after doing a lot of tests, it doesn’t seem to reset the total amount of tokens exactly every minute. I really don’t know how it works.
1 Like
How are you testing it?
The console has a bit of a lag when showing usage. (Last time I tested it, it was about 10-15 minutes.)
I am using directly the Go SDK from vertex. It does not reset tokens at exact times. Unfortunately I did not find any clear pattern to be able to determine which algorithm is used for the rate limit.