[BUG] Persistent 429 Errors on Paid Tier 1 Despite <10% Quota Usage (gemini-2.5-flash)

Hello,

I’m experiencing persistent 429 RESOURCE_EXHAUSTED errors with gemini-2.5-flash on a Paid Tier 1 account, despite being well under all quota limits.

Environment:

Quota Status (from Cloud Console, gemini-2.5-flash):

  • RPM limit: 1,000 — Peak usage: ~8.6% (~86 RPM)
  • TPM limit: 1,000,000 — usage well below limit

Error:

HTTP 429 Too Many Requests
retry_after: <empty>
request_id: <empty>
body: <empty>

Use Case:
I am running a benchmark system with a multi-agent pipeline. Each benchmark task spawns ~6 agents in parallel, each making 1 API call. Total concurrent requests per task is small, and I am running only 1 task at a time. Token usage per request ranges from ~10K to ~25K tokens.

What I’ve confirmed:
:white_check_mark: Billing is active (Paid Tier 1)
:white_check_mark: Cloud Console shows RPM peak usage at only 8.6% of limit
:white_check_mark: TPM is well within limits
:white_check_mark: Issue persists even with delays between requests
:white_check_mark: retry_after header is empty — no guidance on when to retry

Hypothesis:
This appears to match the known bug reported in this forum (Dec 2025) where paid tier projects are incorrectly throttled, possibly due to gemini-2.5-flash still being in Preview with a separate hidden quota that doesn’t align with the paid tier limits shown in Cloud Console.

I have seen similar reports from other users where Google staff confirmed an internal bug and pushed a fix. Is my project similarly affected?

Happy to share my project number via direct message if that helps investigation.

Thank you!