Gemini 2.0 flash multimodal rate limits

Hello Gemini team,

I want to use it in production and launch my SaaS but came across these 2 limitations:

  1. Rate limits
    The following rate limits apply:
    3 concurrent sessions per API key

and

  1. Maximum session duration
    Session duration is limited to up to 15 minutes for audio. When the session duration exceeds the limit, the connection is terminated.

Source: Multimodal Live API  |  Gemini API  |  Google AI for Developers

These are very restrictive for production launch, especially the concurrent conversations limit of 3. Do you have a higher plan I can upgrade to or solution planned for this?

Thanks!

1 Like

Welcome to the forum. The full name of the model is Gemini 2.0 Flash experimental. That last word means it is not intended for production and in using it, you have agreed to not use it in production when you clicked that “I agree” button.

It will eventually get promoted to non-experimental status. Then you can go ahead and use it.

Hope that helps.

3 Likes

If you’re working with node.js, I made a little “Key Mixer” library specifically for extending Gemini rate limits… It works as a substitute for normal .env environment variables, and rotates multiple keys on a round robin basis.

Essentially, what you do is:

  • obtain several API keys, one for each of your Google accounts
  • add them to a keystore.json file (see the example in the npm docs)
  • npm install key-mixer and then use the package as per docs… each time you get the key for a particular service (such as Gemini), it will give you a different key from your keystone…

The result? In this case, let’s say you have 5 Gmail accounts, and get a free API key for each in AI studio… by using the key mixer your multimodal live rate limits are therefore increased 5x - so instead of 3 concurrent conversations, you can now have 15

  • Note: I have only tested with small numbers of keys, 5 or less… if you require, say, 150 concurrent connections, DO NOT simply rotate 50 Gemini keys and expect it to work for more than a short time unless you also create your own infrastructure to ensure that these connections are spread out among different servers with different IP addresses, so that it does not trip any automated security mechanisms (of course, you could also originate the connections from the user’s browser, client side, which would avoid this issue altogether, if it’s suitable for your use case - just be prepared to frequently invalidate keys because they won’t be secure and others will start using them)
1 Like

Thank you for sharing this brilliant solution and your Key Mixer library! It’s a smart and practical approach to extend API rate limits. I really appreciate you taking the time to share it with the community.