Hi team,
I’m on Tier 1 (active billing) in AI Studio, calling gemini-2.5-flash through the standard endpoint. Today, every request hits 429 RESOURCE_EXHAUSTED, while the AI Studio quota dashboard shows usage on RPM 31 / 1K, TPM 11.12K / 1M, and RPD 405 / 10K. Two API keys on the same project are both affected. The same workload was healthy yesterday at sustained with multi-thousand-token prompts.
The gate tightens with each retry rather than refilling with idle time - clearly anti-abuse, not a published TPM bucket. Empirical input-token ceiling per request, measured by sending a single isolated request and reading promptTokenCount from usageMetadata:
| Time today | After what traffic | OLD key passes ≤ | NEW key passes ≤ |
|---|---|---|---|
| ~10:00 | (idle overnight) | ~600 tokens | ~120 tokens |
| ~19:00 | one batch of ~20 reqs | ~50 tokens | ~10 tokens |
| ~20:30 | another batch of ~180 reqs | ~10 tokens | ~10 tokens |
Reproduces with serviceTier: "standard" AND serviceTier: "flex" (response header confirms x-gemini-service-tier: flex, still 429 at ~6K input tokens). Not service-tier-scoped - looks account/project-level.
Tiny prompts (≤10 tokens) still succeed, so this is clearly not RPD / daily-quota exhaustion.
Asks:
- Is there a documented path for releasing the compounding cooldown short of waiting?
Context
This is a simple PoC RAG system - exactly the kind of workload AI Studio markets to. I’ve built equivalent PoC RAG pipelines on the OpenAI API and never encountered this kind of opaque, compounding rate-limit behaviour. The current situation - where the same workload that ran
cleanly yesterday is now structurally unable to issue a single useful request, with almost 0% reported usage and no published mechanism explaining why - makes it genuinely hard to do honest R&D on this platform, let alone consider Gemini for production. Visibility into what’s actually limiting the project would help a lot.
Thanks!