Tier 1 / AI Studio — gemini-2.5-flash 429 RESOURCE_EXHAUSTED at <100 tokens with ~0% dashboard usage

Hi team,

I’m on Tier 1 (active billing) in AI Studio, calling gemini-2.5-flash through the standard endpoint. Today, every request hits 429 RESOURCE_EXHAUSTED, while the AI Studio quota dashboard shows usage on RPM 31 / 1K, TPM 11.12K / 1M, and RPD 405 / 10K. Two API keys on the same project are both affected. The same workload was healthy yesterday at sustained with multi-thousand-token prompts.

The gate tightens with each retry rather than refilling with idle time - clearly anti-abuse, not a published TPM bucket. Empirical input-token ceiling per request, measured by sending a single isolated request and reading promptTokenCount from usageMetadata:

Time today After what traffic OLD key passes ≤ NEW key passes ≤
~10:00 (idle overnight) ~600 tokens ~120 tokens
~19:00 one batch of ~20 reqs ~50 tokens ~10 tokens
~20:30 another batch of ~180 reqs ~10 tokens ~10 tokens

Reproduces with serviceTier: "standard" AND serviceTier: "flex" (response header confirms x-gemini-service-tier: flex, still 429 at ~6K input tokens). Not service-tier-scoped - looks account/project-level.
Tiny prompts (≤10 tokens) still succeed, so this is clearly not RPD / daily-quota exhaustion.

Asks:

  1. Is there a documented path for releasing the compounding cooldown short of waiting?

Context

This is a simple PoC RAG system - exactly the kind of workload AI Studio markets to. I’ve built equivalent PoC RAG pipelines on the OpenAI API and never encountered this kind of opaque, compounding rate-limit behaviour. The current situation - where the same workload that ran
cleanly yesterday is now structurally unable to issue a single useful request, with almost 0% reported usage and no published mechanism explaining why - makes it genuinely hard to do honest R&D on this platform, let alone consider Gemini for production. Visibility into what’s actually limiting the project would help a lot.

Thanks!

Hi Dmytro,

Please add your details to this form & we’ll check it out.

Your response came too late. I had already spent time on this issue, and it caused me to miss the deadline for this task.

I’m disappointed with this level of service and won’t use Gemini for this task anymore. After the time lost and the impact it caused, I don’t want to spend more time filling out forms.

For now, I’ve stopped using your models.