In Gemini's rate limiting, is the input token immediately counted toward the TPM when the request is initiated? Or is it asynchronously tallied into the TPM only after the request completes?

Documentation on Gemini’s rate limiting is scarce. I’m currently developing a rate limiter for my project, which integrates with numerous models. I need to know: for Gemini, are input tokens counted toward the rate limit TPM when a request is initiated?

1 Like

Hi @hh_yu ,

Apologies for the delayed response. Could you please share whether you’ve noticed any behavior (for example, delayed 429s) that might indicate tokens are accounted for after completion?

Thanks