Spike in 429 RESOURCE_EXHAUSTED with v1beta1 StreamGenerateContent (Gemini 3 Flash Preview / Vertex global) — quotas look fine

Max_Steel · February 5, 2026, 3:50am

Hi all — we’re seeing a persistent increase in 429 RESOURCE_EXHAUSTED errors in production when calling Gemini via Vertex AI streaming.

What changed

Previously our stack was using:
- google.cloud.aiplatform.v1.PredictionService.StreamGenerateContent
- This was noticeably more stable (rare 429s).
Recently it switched to:
- google.cloud.aiplatform.v1beta1.PredictionService.StreamGenerateContent
- Since then we’re seeing lots of RESOURCE_EXHAUSTED failures.

We’re calling from google-adk, using streaming responses. The ADK library appears to hardcode the Vertex client to v1beta1, so it’s difficult to test v1 without patching.

Symptoms

Error: google.adk.models.google_llm._ResourceExhaustedError (maps to 429 RESOURCE_EXHAUSTED)
Model: gemini-3-flash-preview
Location: global (Gemini 3 Flash preview seems global-only on Vertex)
Happens despite quotas looking within limits in the console.
These are not huge bursts — it occurs during normal interactive chat traffic

Questions

Is anyone else seeing an uptick in 429 RESOURCE_EXHAUSTED specifically with v1beta1 StreamGenerateContent in the last ~1–2 weeks?
Are there additional/granular limits (per-model / per-project / per-stream concurrency / per-minute token limits) that don’t show up clearly in the standard quota charts?
Does RESOURCE_EXHAUSTED reliably distinguish between:
- quota exceeded vs
- backend capacity / shared contention
  …and is there a recommended way to tell which one we’re hitting (e.g., specific metric strings, headers, gRPC details like RetryInfo)?
Any best practices for stability here beyond:
- client-side concurrency limiting
- exponential backoff with jitter
- token/context reduction
- or purchasing Provisioned Throughput for gemini-3-flash-preview?

Happy to share anonymized logs if helpful (timestamps, approximate request rates, concurrent stream counts, input/output token estimates, etc.). Mainly trying to understand whether this is a known issue with current global capacity / v1beta1 routing, and what the recommended mitigation is.

Thanks!

Urank · February 5, 2026, 11:08am

The 429 RESOURCE_EXHAUSTED with v1beta1 StreamGenerateContent often isn’t just about visible quotas. It can reflect per-model or per-stream concurrency limits, backend capacity, or token-rate limits that aren’t shown in the console. RESOURCE_EXHAUSTED doesn’t always distinguish between quota vs. backend contention, but gRPC headers like RetryInfo or monitoring serving.googleapis .com/quota_exceeded metrics can help identify the cause.

For stability, keep client-side concurrency low, use exponential backoff with jitter, reduce token/context size, and consider Provisioned Throughput if consistent capacity is needed. The uptick with v1beta1 likely relates to routing changes and global shared load.

Jason_R · February 5, 2026, 9:19pm

We are seeing this exact same issue too..and its with only using `gemini-3-pro-image-preview` through Vertex Studio which routes through: google.cloud.aiplatform.ui.PredictionService.StreamGenerateContent, which quite possibly may be hitting the v1beta1 under the hood.

It’s not intermittent..its been like this for the past several days, with no semblance of it actually responding with anything but a 429 for us.

There are multiple reports of this across Vertex and Gemini API users, some going on for months..but no resolution.

It’s definitely not about quotas for us. This really feels like its at the infrastructure level and their is general platform instability with this model in particular and how shared throughput is being managed (or not)

Sergey_A · February 18, 2026, 10:12pm

so basically you’re saying we should stop using it all.

Topic		Replies	Views
Intermittent 429 RESOURCE_EXHAUSTED despite low quota usage (billing enabled) Gemini API vertexai	4	562	January 23, 2026
Sudden Spike in 429 Errors with Gemini 2.5 via Vertex AI Global Endpoint Gemini API vertexai , gemini	7	1067	April 8, 2026
Gemini-2.5-flash-image: Frequent 429 RESOURCE_EXHAUSTED during sequential image generation - seeking clarity on rate limits Gemini API gemini-api , vertex-ai , rate-limits , image-generation	7	656	February 10, 2026
Gemini-2.0-Flash Response is returning "429" even with peak usage to be less than 1% Gemini API api	2	178	June 12, 2025
Gemini-3.1-flash-image-preview on Vertex AI — 429 RESOURCE_EXHAUSTED on every request Gemini API vertexai	0	57	April 5, 2026

Spike in 429 RESOURCE_EXHAUSTED with v1beta1 StreamGenerateContent (Gemini 3 Flash Preview / Vertex global) — quotas look fine

What changed

Symptoms

Questions

Related topics