I have an application that uses the Gemini API for AI tasks. I sell my solution to my audience, but I’m facing issues with the token limit per minute. I’m using the 1.5 flash 002 model, which has a limitation of 4 million tokens per minute. When multiple users are using the application, this limit is quickly reached, causing the system to stall. How can I request an increase in this limit? I’m based in Brazil and have tried through the Google Cloud console and its partners in Brazil, but so far, I haven’t been able to increase this limit.
Welcome to the forums!
If you are using the Gemini API through Vertex, then the 002 models have dynamic shared quota by default. You can purchase provisioned throughput based on what your expected demands are.
Thank you for the warm welcome.
I am using it through PHP.
Following the link you sent me, it seems that this provisioning only applies to VertexAI. Is the process to increase the token limit per minute for use with PHP the same?
Which library in PHP are you using?
Currently if you’re using the AI Studio Gemini API, then you are limited to the paid level quota. You said you’ve tried “through the Google Cloud console” - what, exactly, have you done?
I am using this library: GitHub - google-gemini-php/client: ⚡️ Gemini PHP is a community-maintained PHP API client that allows you to interact with the Gemini AI API.
In the console, I tried to edit the quotas under “editar cota”: