Extreme latency on gemini-1.5-flash API

Hey,
Starting 17th November, I’ve noticed extremely high latencies on my 1.5 Flash API. Response times have shot up from 14s to ~ 80s. Any pointers on why this is happening? Is the Pro model’s API also affected? I read that it is due to Google allocating resources to a new experimental model. If so, when will the latency go back to normal?

3 Likes

Yes, I’ve noticed this too. Even simple text generations take alot of time. The latency has almost 10x for me

1 Like

I’m also having the same issue (free tier).
A few days ago, I began experiencing significant delays in Gemini-1.5-Flash API responses. Requests that previously took 2-3 seconds are now taking 60-90 seconds. These extended response times make my application unusable.

In contrast, similar requests on Google AI Studio continue to receive responses at their previous speed.

1 Like

I’ve also experienced higher latency in “gemini-1.5-flash-002” than the “gemini-1.5-flash-001” model. Their documentation says that latency is reduced by 3x, but actually it’s the vice-versa! Many time, “gemini-1.5-flash-002” model doesn’t give the response for 10 minutes and then break with “500 Internal server error” and this is the worst thing anyone can expect in production.