Paid Tier 1 Postpay user on the Gemini Developer API — every generateContent call has returned 429
RESOURCE_EXHAUSTED for 24+ hours, and I’ve exhausted every user-side fix. Posting here since Google Cloud Basic
Support can’t take technical cases, and the evidence strongly suggests this is server-side.
Key evidence this is NOT user quota:
Response header: x-gemini-service-tier: standard — the paid tier IS being recognized at the service layer.
Response body has no error.details[] array naming a quota metric. Normal quota 429s always name the
exhausted metric; this one doesn’t, which points to anti-abuse or routing-state throttle.
models.list on the same key returns 200 OK — auth and project are fully functional.
AI Studio Rate Limit dashboard shows peak usage nowhere near limits: 15 / 2000 RPM, 33.8K / 4M TPM, 134 /
Unlimited RPD.
Account details:
Billing: Tier 1 Postpay (Active, budget configured, health checks cleared, linked to project ~24h before
symptoms started)
Generative Language API is enabled on the project
API key prefix: AIzaSyBn... (last 4 chars happy to share privately)
Minted a brand-new API key in the same project — still 429
Tried gemini-2.5-flash — still 429
Request rate during testing < 1 RPM
Suspected cause: billing-promotion-to-quota-enforcement lag, or an account-level anti-abuse throttle applied
before my Tier 1 upgrade fully propagated that is now stuck.
Can someone on the Gemini team check whether this project has a hold on it, or force-propagate the Tier 1 quota
state? Happy to share the project ID and last 4 of the API key via DM.
Update 2026-04-21 — narrowed to gemini-2.0-flash specifically. Same API key, same project, same minute. gemini-2.0-flash is still 429 while every other Gemini model I tried returns 200 on the same key:
Representative gemini-flash-latest success on the same key seconds later:
{
"candidates": [{ "content": { "parts": [{ "text": "How can I help you today?" }], "role": "model" },
"finishReason": "STOP" }],
"usageMetadata": { "promptTokenCount": 1, "candidatesTokenCount": 7, "totalTokenCount": 131,
"thoughtsTokenCount": 123 },
"modelVersion": "gemini-3-flash-preview"
}
This matrix strongly suggests a stuck or orphaned quota counter scoped to gemini-2.0-flash on project gen-lang-client-0754187837 — not an account-wide hold, not a user-quota issue (other models served by the
same key, project, and billing tier work fine), and not a request-shape issue (all three curls are byte-identical
aside from the model path segment).
Can someone on the Gemini team look at the gemini-2.0-flash quota bucket for this specific project and clear or
re-sync it? Happy to run any additional variants (regions, -001 pinned version, gemini-2.0-flash-lite, etc.)
that would help.
In the meantime I’ve switched my app to gemini-2.5-flash to unblock. Leaving this open because the stuck
2.0-flash lane will presumably hit other Tier 1 users the same way.
We had this on flash 2.0 and 2.5 on tier 3. They are having an outage (see Google AI Studio ). Our failure rate was about 50%, so watch out. The api should communicate this better, instead of misleading us with 429 errors.
Google’s API has always been unreliable. We just moved to OpenAI on prod in the meantime while we decide what models and providers to settle on. Lesson is to have a fallback that’s not from the same provider.