Gemini-2.5-flash-image: Frequent 429 RESOURCE_EXHAUSTED during sequential image generation - seeking clarity on rate limits

Joakim_Sarnelid · January 26, 2026, 9:24am

Context

I’m building a children’s storybook app that generates 4 comic panel images per story using gemini-2.5-flash-image via Vertex AI. The app flow is:

User captures a drawing
AI interprets the drawing (text-only, works fine)
Backend generates 4 comic panels sequentially (one after another, not parallel)

The sequential generation is intentional - each panel includes the previous 1-2 images for visual consistency (character appearance, art style).

The Problem

Even with sequential requests and exponential backoff retries, I consistently hit 429 RESOURCE_EXHAUSTED errors, typically on the 3rd or 4th panel of a story. The timing between successful requests is 10-40+ seconds depending on image generation time plus any backoff delays.

Configuration:

Model: gemini-2.5-flash-image
Endpoint: Global (location="global")
Project: Vertex AI on GCP (pay-as-you-go billing)
Retry strategy: 3 attempts with exponential backoff (5s, 10s, 20s)
Request pattern: Strictly sequential, one image at a time

Actual test run (2026-01-26):

10:02:50 - Panel 1: SUCCESS in 6.5s
10:02:56 - Panel 2: SUCCESS in 10.4s
10:03:07 - Panel 3: 429 RESOURCE_EXHAUSTED
           Retry after 5s backoff: 429 again
           Retry after 10s backoff: 429 again
           All 3 retries failed (32.8s total)
10:04:10 - Panel 4: SUCCESS in 9.7s (after 30s wait from panel 3 failure)

Observations:

Hit 429 on 3rd request, only ~10 seconds after 2nd request succeeded
Rate limit persisted for 32.8s of retry attempts
After waiting 30s, the next request succeeded immediately
Effective throughput: ~2 requests per minute before hitting limits
No Retry-After header in 429 responses (would be very helpful for backoff tuning)

Questions

Is this rate limiting expected? With only 1 RPM average (one image every 15-40 seconds), I expected to be well within limits. The quota dashboard shows minimal usage.
Do tier thresholds affect image generation? The Standard PayGo documentation states that “usage tiers don’t apply” to image generation models. However, I’m currently below Tier 1 spend thresholds (~$0 rolling 30-day spend). Could being in this low-spend state still result in more aggressive throttling for image generation, even if not documented as part of the tier system?
Are there known capacity constraints? I’ve seen other threads mentioning traffic-related 429s for Gemini image models. Is gemini-2.5-flash-image currently experiencing capacity constraints?
Recommendations? Besides Provisioned Throughput (which seems like significant overkill for 4 images/story during development), are there strategies I should try:
- Different regional endpoints?
- Specific time-of-day patterns?
- Request modifications (smaller prompts, no multi-image context)?
Retry-After header? Is there a plan to include Retry-After or similar headers in 429 responses? This would help clients implement smarter backoff without guessing.

What I’ve Tried

Using global endpoint (as recommended)
Tested regional endpoint (us-central1) - same rate limiting behavior
Exponential backoff with 3 retries (5s, 10s, 20s delays)
Sequential requests only (no parallelism)
Verified billing is active and linked correctly
Confirmed quota dashboard shows very low usage

tylertreat · January 26, 2026, 9:55pm

I started hitting the same thing in the last few days with gemini-2.5-flash-image after months of no issue. Also seems to happen on the third sequential request. It seems like something changed on Google’s side.

Rene_Augusto_Negrao · January 27, 2026, 12:37am

Eu também to com o mesmo problema mas não com história e sim com edição de imagem ele edita uma imagem modifica 2 veze na 3x pra alterar, ele dá a mensagem: " Retry-After" e You’ve reached your rate limit. Please try again later.
Não mostra o tempo mas como estou testando outras janelas, limpo o cache e problema continua. Isso não é de hoje desde ante ontem dia 24 ja começou a ocorrer mas com menos frequencia.

Joakim_Sarnelid · January 27, 2026, 8:27am

That’s interesting, thanks for sharing. Yes, seems strange that it’s failing so consistently. That doesn’t quite match with my understanding of what 429s should indicate (ie actual resource exhaustion of a global quota shared between all Vertex AI pay-as-you-go users).

tylertreat · January 27, 2026, 3:07pm

I agree it seems odd. I found these other recent threads which appear to be related:

Sterling · January 27, 2026, 11:01pm

Likewise, this has been hitting us pretty hard since ~1/22, where it spiked significantly. I’ve tried the approaches listed here also. We’re likewise generating a batch of several images per workflow. Our absolute daily volume is relatively low though, so it’s not like we’re not spamming the model.

If I exclude exponential backoff retries, I’d say 75% of our requests are failing with 429s. Even with super long, gratuitous backoffs, we’re still seeing like ~35% our requests ultimately failing. Our image gen times have gone from 30s to 3 minutes, and half are broken. Our customers are not very happy with us.

I understand capacity is an issue, and that provisioned throughput is the obvious solution, but I have to admit I’m frustrated that the availability is so abysmal here and that the extremely low availability isn’t communicated more clearly. If this can’t be used for more than I toy project, and guarantees are so low, we need to know that upfront so we can evaluate our providers better.

Topic		Replies	Views
429 error with quota with tier Gemini API ai-studio , api , gemini	38	1252	January 3, 2026
Intermittent 429 RESOURCE_EXHAUSTED despite low quota usage (billing enabled) Gemini API vertexai	4	250	January 23, 2026
Gemini API 429 Error Despite Low Quota Usage on Paid Tier (gemini-2.5-flash) Gemini API bug , rate-limits	29	1217	January 20, 2026
Receiving 429 “Quota Exceeded” on Gemini 2.5 Pro (Tier 1) While Usage Is Under 1% Gemini API api , gemini	37	1245	December 23, 2025
429 Too Many Requests on Vertex AI API generateContent (Gemini 2.5 Pro) Gemini API ai-studio , vertexai , vertex-ai , gemini_25_pro	17	768	January 27, 2026

Gemini-2.5-flash-image: Frequent 429 RESOURCE_EXHAUSTED during sequential image generation - seeking clarity on rate limits

Context

The Problem

Questions

What I’ve Tried

Related topics