Hello
I’ve been working with the Gemini API for a few weeks now, integrating it into my application for enhanced natural language processing tasks. While the API provides impressive capabilities, I’ve encountered issues when dealing with high volumes of requests. Specifically, the rate limiting behavior seems to be more aggressive than expected, resulting in delayed responses and occasionally even dropped requests.
I’ve tried implementing backoff strategies and retries using standard exponential backoff logic, but some responses come with little to no indication of how much time I need to wait before reattempting.
The documentation mentions rate limits, but I’m wondering if there’s a better way to programmatically determine the wait times between requests or any best practices that might help smooth the integration in a production environment. I have checked Gemini API Cookbook | Google AI for Developers mongodb documentation guide for reference .
I’m also curious if others have encountered similar issues and whether there are specific configuration settings or tweaks in the Gemini SDK that might help with this challenge. Any feedback or insights would be greatly appreciated.
Thank you !