Unexpected Delay in Gemini-1.5-Flash API Responses

Hello,

A few days ago, I began experiencing significant delays in Gemini-1.5-Flash API responses. Requests that previously took 2-3 seconds are now taking 60-90 seconds. These extended response times make my application unusable.

In contrast, similar requests on Google AI Studio continue to receive responses at their previous speed.

I would appreciate your assistance in identifying the cause of these delays to help me debug and resolve this critical issue.

Thank you for your help.

1 Like

Similar problem. I wonder if the Pro model’s API has also been affected.

1 Like

I found a solution which returns the API response time to normal for my app’s functionality:

  1. Use a different Gemini 1.5 Flash model. It can be either 1.5 Flash 002 or 1.5 Flash-8b. This can be done manually by changing the model_name parameter from gemini-1.5-Flash to gemini-1.5-Flash-002 or gemini-1.5-Flash-8b. Another way is via Google AI Studio by selecting gemini-1.5-Flash-002 or gemini-1.5-Flash-8b as your model and using the Get code button to get the updated code.
  2. In both alternative models, the top_k value changes from 64 to 40. Verify changing this parameter, or you’ll receive the following error:

google.api_core.exceptions.InvalidArgument: 400 Unable to submit request because it has a topK value of 64 but the supported range is from 1 (inclusive) to 41 (exclusive). Update the value and try again.

  1. Test the results for both alternative models. I’ve found that for my app, both models had similar API response times, but gemini-1.5-Flash-002 followed the system instruction better than gemini-1.5-Flash-8b.

I wish we could be notified regarding such issues in advance in the future. I don’t even know if this malfunction in gemini-1.5-Flash is a bug or an intentional action to divert developers to more advanced models. I’d appreciate if someone from the Google/Gemini team could respond and consider such a notification.