Unexpected Delay in Gemini-1.5-Flash API Responses

tull_wood · November 20, 2024, 12:45pm

Hello,

A few days ago, I began experiencing significant delays in Gemini-1.5-Flash API responses. Requests that previously took 2-3 seconds are now taking 60-90 seconds. These extended response times make my application unusable.

In contrast, similar requests on Google AI Studio continue to receive responses at their previous speed.

I would appreciate your assistance in identifying the cause of these delays to help me debug and resolve this critical issue.

Thank you for your help.

2cheeze4u · November 20, 2024, 6:14pm

Similar problem. I wonder if the Pro model’s API has also been affected.

tull_wood · November 21, 2024, 12:49am

I found a solution which returns the API response time to normal for my app’s functionality:

Use a different Gemini 1.5 Flash model. It can be either 1.5 Flash 002 or 1.5 Flash-8b. This can be done manually by changing the model_name parameter from gemini-1.5-Flash to gemini-1.5-Flash-002 or gemini-1.5-Flash-8b. Another way is via Google AI Studio by selecting gemini-1.5-Flash-002 or gemini-1.5-Flash-8b as your model and using the Get code button to get the updated code.
In both alternative models, the top_k value changes from 64 to 40. Verify changing this parameter, or you’ll receive the following error:

google.api_core.exceptions.InvalidArgument: 400 Unable to submit request because it has a topK value of 64 but the supported range is from 1 (inclusive) to 41 (exclusive). Update the value and try again.

Test the results for both alternative models. I’ve found that for my app, both models had similar API response times, but gemini-1.5-Flash-002 followed the system instruction better than gemini-1.5-Flash-8b.

I wish we could be notified regarding such issues in advance in the future. I don’t even know if this malfunction in gemini-1.5-Flash is a bug or an intentional action to divert developers to more advanced models. I’d appreciate if someone from the Google/Gemini team could respond and consider such a notification.

Topic		Replies	Views
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	554	January 6, 2025
Slow response from Gemini 2.0 Flash Experimental Google AI Studio gemini-flash	11	1218	March 1, 2025
⚠️ Gemini API Instability During New Model Releases Gemini API gemini-15 , feedback , api	3	364	June 17, 2025
Response time for Gemini API Gemini API models , python	5	864	December 13, 2024
Cannot use gemini 1.5 flash model says overload Gemini API gemini-15 , api	5	285	June 19, 2025

Unexpected Delay in Gemini-1.5-Flash API Responses

Related topics