I have this fined tuned model based off of gemini-1.5-flash that I want hit using the API. Thinking that the RPM was 2000, I instructed my code to make about 1500 request per minute. Long story short I experienced a lot of RESOURCE_EXHAUSTED errors and soon only RESOURCE_EXHAUSTED errors - even with a much lower rpm and very patient try and retry logic. I then gave it a 12 hour rest and made a single request, RESOURCE_EXHAUSTED.
I have since learned that the RPM is very likely 360 for this model. Fine, I can work with that. But when can I resume using the model at the lower rate? AM I being penalised for X amount of time for reaching the quota or there is some other limitation in place? Like, maybe I have also reached some unknown daily max request?
Can anyone shed some light on this?