When using the Python SDK to access the Gemini API, I get 500 InternalServerErrors when asynchronously sending more than ~15 requests per minute to a fine tuned model. The way that this error is triggered would make it seem like a rate limiting problem, but as far as I know fine tuned models have equivalent rate limits to the model they’re based on. Since I’m on a paid plan, and it’s based on Gemini 1.5 Flash, that limit should be 2000 requests per minute. The documentation around 500 errors isn’t helpful in the slightest and I can’t find anyone else who has this problem.
The inputs are very short, only a few words at most.