Intermittent 429 RESOURCE_EXHAUSTED despite low quota usage (billing enabled)

cas_stg · January 16, 2026, 1:12pm

Dear Support Team,

We are intermittently experiencing 429 RESOURCE_EXHAUSTED errors from the Gemini / Vertex AI API in a paid project, even though all visible quotas are well within limits.

Error example:
429 RESOURCE_EXHAUSTED
{
“error”: {
“code”: 429,
“message”: “Resource exhausted. Please try again later.
Please refer to ``https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429”``,
“status”: “RESOURCE_EXHAUSTED”
}
}

Project ID: hopeful-breaker-454412-u9

Details:

Billing is enabled and active
Visible quota usage in the Cloud Console is consistently <1%
Errors occur a few times per hour under normal, steady production load (not very large bursts)
Retries are implemented with backoff and jitter (base ~3s + 2–5s jitter)

Impact:
This is affecting a production application.

Request:
We suspect an undocumented internal quota, concurrency limit, token-throughput limit, or shared capacity throttling that is not exposed in the quota dashboard.

Could you please:

Confirm which internal quota or limit is being exceeded
Verify the project is not being enforced under free-tier or incorrect backend limits
Advise whether concurrency or capacity limits can be adjusted

Screenshots of the quota dashboard are attached.

Thank you for your assistance.

Organspendeausweis · January 18, 2026, 7:59am

We have the same issue. We are on Tier 1 and get error 429, but can not track down the quota that caused it, because nothing shows up in “current usage percentage”.

ASHQKING · January 19, 2026, 4:58am

The 429 RESOURCE_EXHAUSTED error in this context typically stems from one of three causes that are not visible in the “Requests Per Minute” (RPM) view you shared.

Recommended Troubleshooting Steps

Step 1: Verify Token Usage (TPM)

Go back to the IAM & Admin > Quotas page:

Filter by the exact model causing errors (e.g., gemini-2.5-flash).
Look for “base_model_id_and_resolution: gemini-2.5-flash…-tokens-per-minute”.
Check: Is this bar spiking during your error windows? If so, you need to request a quota increase specifically for TPM, not RPM.

Step 2: Test a GA Model vs. Preview

If the errors are coming from gemini-3-flash-preview:

Action: Temporarily switch that traffic to gemini-1.5-pro-002 or gemini-1.5-flash-002 (Stable/GA versions).
Why: GA models have Service Level Agreements (SLAs) and reserved capacity. If the errors stop, the issue was “Preview” capacity throttling, which you cannot fix other than by waiting or switching models.

Step 3: Regional Redundancy

If europe-west4 is legitimately experiencing “Shared Capacity” issues (which happens):

Action: Configure your client to failover to a different region (e.g., us-central1 or europe-west1) upon receiving a 429.
Note: This requires your data residency requirements to allow processing in other regions.

Summary for your Engineering Team

Hypothesis: The application is likely hitting a Token Throughput (TPM) limit which is distinct from the Request (RPM) limit shown, OR it is suffering from Service Health/Capacity throttling on the gemini-3-flash-preview model.

cas_stg · January 19, 2026, 10:45am

Additional evidence:

We are seeing 429 RESOURCE_EXHAUSTED errors on gemini-3-flash-preview even when token usage is well below 50k TPM and the quota dashboard shows “Unlimited” for this model.

The error responses contain no quota name or dimension.

This strongly suggests shared capacity or preview-model admission control rather than a visible quota limit.

Could you please confirm whether our project is being throttled due to preview model capacity and whether a GA alternative or capacity adjustment is recommended?

Mateo_Hysa · January 23, 2026, 4:09pm

Have been experiencing the same thing with other models. I thought Vertex AI was supposed to be the reliable one with a couple steps to set up, but that does not seem to be the case.

Topic		Replies	Views
Spike in 429 RESOURCE_EXHAUSTED with v1beta1 StreamGenerateContent (Gemini 3 Flash Preview / Vertex global) — quotas look fine Gemini API gemini-api , live-streaming , gemini-3	3	294	February 18, 2026
429 "RESOURCE_EXHAUSTED" Error on Paid API Despite Being Far Below All Quota Limits Gemini API ai-studio	20	1680	April 8, 2026
429 resource_exhausted Gemini API bug , gemini	41	7332	April 1, 2026
Gemini API 429 Error Despite Low Quota Usage on Paid Tier (gemini-2.5-flash) Gemini API bug , rate-limits	39	2412	March 11, 2026
Issue with 429 Error on Gemini API Despite Staying Within Rate Limits Gemini API gemini-api	13	1802	March 10, 2026