Sudden drastic degradation in latenecy and error rates with Gemini 2.0 Flash

Dylan_Pierce · February 28, 2025, 10:54am

Hello,

I’ve been using the Gemini 2.0 Flash model with the OpenAI client for some time now.

My prompts contain a single image and text, on average about 3,000 tokens per request. I used the Google AI Studio to create the API key, but I have a long history of a GCP account with billing well into Tier 2 qualifications.

Since Feb 25th, I’ve been seeing a huge increase in latency and error rates coming from the Gemini API:

20% of requests are failing. The p99 is now 8 minutes and average time for responses is now up to 40 seconds.

Is there some significant reason why an API key created by Google Studio AI might be deprioritized? I’m not seeing an degradation notice on the GCP status page for Gemini. That’s leading me to believe only my API key is affected.

I also looked into usage, and it’s not anywhere close to the RPM/TPM limits for Tier 1.

On a side note, the quota details for viewing Gemini 2.0 usage is broken in the GCP console. I’m using the GCP API quotas dashboard to try to view these stats to find the culprit:

I thought Gemini 2.0 was production ready? Is there a different way I need to create an API key or prepurchase credits to avoid silent rate limits or have better performance?

Dylan_Pierce · February 28, 2025, 10:55am

This forum doesn’t allow multiple screenshots per post, so here’s the Gemini usage details failing to load in the Quotas section:

Topic		Replies	Views
Unexpected Delay in Gemini-1.5-Flash API Responses Gemini API gemini-15 , api	2	610	November 21, 2024
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	568	January 6, 2025
[FREE tier] Noticeable drop in gemini-2.0-flash throughput (429 errors) Gemini API gemini-api , gemini-20 , rate-limits	1	74	June 17, 2025
Slow response from Gemini 2.0 Flash Experimental Google AI Studio gemini-flash	11	1241	March 1, 2025
Getting 429 Errors - But Usage Charts Show no Traffic Gemini API api	54	2519	July 3, 2025

Sudden drastic degradation in latenecy and error rates with Gemini 2.0 Flash

Related topics