Hello,
I’m experiencing persistent 429 RESOURCE_EXHAUSTED errors with gemini-2.5-flash on a Paid Tier 1 account, despite being well under all quota limits.
Environment:
- Model: gemini-2.5-flash
- API Endpoint: https://generativelanguage.googleapis.com/v1beta/openai/v1/chat/completions (OpenAI-compatible)
- Billing: Paid Tier 1
Quota Status (from Cloud Console, gemini-2.5-flash):
- RPM limit: 1,000 — Peak usage: ~8.6% (~86 RPM)
- TPM limit: 1,000,000 — usage well below limit
Error:
HTTP 429 Too Many Requests
retry_after: <empty>
request_id: <empty>
body: <empty>
Use Case:
I am running a benchmark system with a multi-agent pipeline. Each benchmark task spawns ~6 agents in parallel, each making 1 API call. Total concurrent requests per task is small, and I am running only 1 task at a time. Token usage per request ranges from ~10K to ~25K tokens.
What I’ve confirmed:
Billing is active (Paid Tier 1)
Cloud Console shows RPM peak usage at only 8.6% of limit
TPM is well within limits
Issue persists even with delays between requests
retry_after header is empty — no guidance on when to retry
Hypothesis:
This appears to match the known bug reported in this forum (Dec 2025) where paid tier projects are incorrectly throttled, possibly due to gemini-2.5-flash still being in Preview with a separate hidden quota that doesn’t align with the paid tier limits shown in Cloud Console.
I have seen similar reports from other users where Google staff confirmed an internal bug and pushed a fix. Is my project similarly affected?
Happy to share my project number via direct message if that helps investigation.
Thank you!