Tier 1 / AI Studio — gemini-2.5-flash 429 RESOURCE_EXHAUSTED at <100 tokens with ~0% dashboard usage

Hi team,

I’m on Tier 1 (active billing) in AI Studio, calling gemini-2.5-flash through the standard endpoint. Today, every request hits 429 RESOURCE_EXHAUSTED, while the AI Studio quota dashboard shows usage on RPM 31 / 1K, TPM 11.12K / 1M, and RPD 405 / 10K. Two API keys on the same project are both affected. The same workload was healthy yesterday at sustained with multi-thousand-token prompts.

The gate tightens with each retry rather than refilling with idle time - clearly anti-abuse, not a published TPM bucket. Empirical input-token ceiling per request, measured by sending a single isolated request and reading promptTokenCount from usageMetadata:

Time today After what traffic OLD key passes ≤ NEW key passes ≤
~10:00 (idle overnight) ~600 tokens ~120 tokens
~19:00 one batch of ~20 reqs ~50 tokens ~10 tokens
~20:30 another batch of ~180 reqs ~10 tokens ~10 tokens

Reproduces with serviceTier: "standard" AND serviceTier: "flex" (response header confirms x-gemini-service-tier: flex, still 429 at ~6K input tokens). Not service-tier-scoped - looks account/project-level.
Tiny prompts (≤10 tokens) still succeed, so this is clearly not RPD / daily-quota exhaustion.

Asks:

  1. Is there a documented path for releasing the compounding cooldown short of waiting?

Context

This is a simple PoC RAG system - exactly the kind of workload AI Studio markets to. I’ve built equivalent PoC RAG pipelines on the OpenAI API and never encountered this kind of opaque, compounding rate-limit behaviour. The current situation - where the same workload that ran
cleanly yesterday is now structurally unable to issue a single useful request, with almost 0% reported usage and no published mechanism explaining why - makes it genuinely hard to do honest R&D on this platform, let alone consider Gemini for production. Visibility into what’s actually limiting the project would help a lot.

Thanks!