Sudden drastic degradation in latenecy and error rates with Gemini 2.0 Flash

Hello,

I’ve been using the Gemini 2.0 Flash model with the OpenAI client for some time now.

My prompts contain a single image and text, on average about 3,000 tokens per request. I used the Google AI Studio to create the API key, but I have a long history of a GCP account with billing well into Tier 2 qualifications.

Since Feb 25th, I’ve been seeing a huge increase in latency and error rates coming from the Gemini API:

20% of requests are failing. The p99 is now 8 minutes and average time for responses is now up to 40 seconds.

Is there some significant reason why an API key created by Google Studio AI might be deprioritized? I’m not seeing an degradation notice on the GCP status page for Gemini. That’s leading me to believe only my API key is affected.

I also looked into usage, and it’s not anywhere close to the RPM/TPM limits for Tier 1.

On a side note, the quota details for viewing Gemini 2.0 usage is broken in the GCP console. I’m using the GCP API quotas dashboard to try to view these stats to find the culprit:

I thought Gemini 2.0 was production ready? Is there a different way I need to create an API key or prepurchase credits to avoid silent rate limits or have better performance?

This forum doesn’t allow multiple screenshots per post, so here’s the Gemini usage details failing to load in the Quotas section: