429 RESOURCE_EXHAUSTED - blocks legitimate platform usage

# gemini-3-pro-image-preview: 37% error rate due to content-based 429 RESOURCE_EXHAUSTED - blocks legitimate platform usage

## Environment

- **Model:** gemini-3-pro-image-preview

- **Endpoint:** Global (location: “global”)

- **SDK:** @google/genai (Node.js)

- **Auth:** Vertex AI service account

- **Quota page shows:** “Unlimited” for image generation requests

## Summary

We are building a content creation platform that uses gemini-3-pro-image-preview to generate images on behalf of users. Our platform constructs detailed prompts (including identity preservation instructions, style references, and editing instructions) so that users can generate professional content without needing to write prompts themselves.

We are experiencing a consistent 37% error rate on GenerateContent calls, all returning 429 RESOURCE_EXHAUSTED. Our quotas page shows “Unlimited” for image generation requests on the global endpoint, confirming this is not a standard quota issue. Out of 1,564 total GenerateContent requests, approximately 585 have failed with 429.

Additionally, we have spent over $500 on Vertex AI but are told we are “not eligible” to purchase paid support, leaving us with no way to get direct technical assistance.

## Detailed Findings

We have conducted extensive testing over several weeks to understand the exact behavior of the 429 errors. Here is what we have found:

### Finding 1: The 429 is triggered by repeated similar prompts

When we send the same prompt (for example, generating an image of a specific character) 3-5 times consecutively, the API starts returning 429 RESOURCE_EXHAUSTED. This happens regardless of how much time passes between requests. We tested waiting 24 hours between identical prompts and the 429 persisted on the next attempt with the same prompt.

### Finding 2: Different prompts sometimes work, sometimes don’t

After receiving a 429 for a given prompt, sending a completely different prompt succeeds approximately 50% of the time. This suggests there are two overlapping issues: content-based repetition detection AND general shared capacity constraints on the preview model.

### Finding 3: The detection is semantic-level, not text-level

We tested minor word-level changes to prompts (for example, changing “red apple” to “green apple”) and these did NOT bypass the detection. We also tried using Gemini Flash to completely rephrase prompts using entirely different wording, sentence structure, and vocabulary while preserving the same visual meaning. The rephrased prompts were also blocked. This indicates the detection operates on semantic similarity, not exact text matching.

### Finding 4: Adding metadata or unique identifiers does not help

We appended unique identifiers to each prompt (e.g., `[generation:unique-uuid]`) to make each request technically unique at the text level. This had no effect. The detection system appears to ignore structured metadata and fingerprint only the actual content.

### Finding 5: The detection is NOT project-scoped – it operates at the model infrastructure level

This is our most significant finding. We created a brand new GCP project with a completely separate service account that had never made a single API request. We linked it to the same billing account and enabled Vertex AI. The very first request from this new project, using a prompt that was previously blocked on our primary project, also returned 429 RESOURCE_EXHAUSTED.

This proves the detection is not per-project and not per-API-key. It operates at Google’s model serving backend, shared across all projects. A content fingerprint flagged on one project is blocked on all projects hitting the same backend infrastructure.

### Finding 6: Quota page shows “Unlimited” but 429s persist

The system limit for gemini-3.0-pro-image-preview_default_res on the global endpoint shows:

- Value: Unlimited

- Current usage: 560

- Adjustable: No

Despite “Unlimited” quota, 37% of requests fail. This confirms the 429s are not from hitting a quota ceiling.

## Why This Is a Problem for Legitimate Use Cases

Our platform serves users who:

1. **Iterate on character designs** – A user creates a character and generates multiple images in different poses, outfits, or scenarios. The prompts are semantically similar (same character description) with minor variations. This triggers the repetition detection after 3-5 generations.

2. **Re-generate when results aren’t satisfactory** – If the first generation doesn’t look right, users click “Generate” again with the same or slightly modified prompt. This is the most basic expected behavior on any image generation platform, and it triggers the block.

3. **Use identity preservation mode** – Our platform supports maintaining a consistent character identity across multiple generations. By definition, these prompts share significant semantic overlap because they describe the same person. The repetition detection treats this as spam.

4. **Edit existing images** – Users upload an image and request edits (change outfit, change background, etc.). The base prompt describing the image stays similar across edits, triggering the detection.

All of these are legitimate, expected use patterns for an image generation platform. The current behavior makes gemini-3-pro-image-preview unsuitable for any interactive image generation product where users iterate on their work.

## What We Have Tried (None of These Work)

| Approach | Result |

|—|—|

| Multiple GCP projects with separate service accounts | Same 429 – detection is cross-project at model backend |

| LLM-based semantic prompt rewriting (Gemini Flash) | Still caught by semantic fingerprinting |

| Adding unique UUID metadata to each prompt | System ignores metadata, fingerprints content only |

| Word-level prompt modifications | Semantic detection too robust |

| Exponential backoff with jitter (up to 12s delays) | Time alone does not reset the detection window |

| Waiting 24 hours between identical prompts | Still blocked until a genuinely different prompt is sent |

| Using the global endpoint | Already using it – same behavior |

## Technical Details of Our Implementation

- We send 1 image generation request at a time per user

- Maximum 2 concurrent requests per user, 10 concurrent globally

- Prompts range from 300 to 2,000 tokens

- Requests may include 0-3 reference images (for identity preservation or editing)

- We use exponential backoff with jitter on 429

- We implement project-based failover (primary project → fallback project → backoff → retry)

- Average successful request latency: ~22 seconds

- 99th percentile latency: ~2 minutes (for requests that eventually succeed after delays)

## Questions for the Vertex AI Team

1. Is there a way to whitelist a project or billing account to exempt it from the content-based repetition throttling? Our traffic is legitimate platform usage, not abuse.

2. What is the exact mechanism behind the content-based 429 behavior? Is it documented anywhere? We could not find any documentation describing semantic-level content fingerprinting causing 429 errors.

3. Is this behavior specific to the preview stage of gemini-3-pro-image-preview? Will it be relaxed when the model reaches GA?

4. What is the recommended architecture for platforms that generate similar content iteratively (character design tools, image editing tools, identity-consistent generation)?

5. Why does our quota page show “Unlimited” while 37% of requests fail with 429? What is the actual limiting factor?

6. We are spending $500+ on Vertex AI but are told we are “not eligible” for paid support. How can we get technical support as a paying customer?

## Reproduction Steps

1. Create a Vertex AI project with pay-as-you-go billing

2. Use gemini-3-pro-image-preview via the global endpoint

3. Send the same image generation prompt 3-5 times consecutively

4. Observe: after 3-5 requests, all subsequent requests with the same or semantically similar prompt return 429 RESOURCE_EXHAUSTED

5. Send a completely different prompt – it will succeed approximately 50% of the time

6. Return to the original prompt – it may work again after the pattern is broken

## Expected Behavior

Requests within quota limits (which show “Unlimited”) should succeed regardless of content similarity, as long as the caller is not exceeding rate limits. Content-based throttling, if it must exist, should be documented and should have configurable thresholds or an exemption process for legitimate platforms.

## Actual Behavior

37% of all GenerateContent requests fail with 429 RESOURCE_EXHAUSTED despite “Unlimited” quota. The failures correlate with semantic content similarity across consecutive requests and persist across separate GCP projects and service accounts.

-–

This is a blocking issue for our product launch. Any guidance or resolution would be greatly appreciated.

2 Likes

Hi BigEagle888,

I’m afriad the Vertex AI team don’t monitor this forum, and we don’t pass feedback from here to them.

You may have more luck on the Google Cloud forum.

That said, I’ll provide the advice I’d give if this was an issue with the Gemini Developer API, perhaps some of it will be helpful:

As far as I know we (Developer API) don’t have any blocking for similar requests.

429 (“Too Many Requests”) errors are usually triggered if you exceed one of our listed quotas:

  • Requests per minute (RPM)
  • Tokens per minute (TPM)
  • Requests per day (RPD)

It sounds to me like you might be tripping RPM limits at least some of the time. I’d suggest checking all quotas. In AI Studio, you’d do this: AI Studio > Dashboard > API Keys > View usage (the bar chart symbol next to your project), check the Quota & Rate Limit tabs. Usage data is delayed ~15 mins.

If you’re running into limitations from RPM or TPM limits, consider implementing backoff and retry with a long maximum delay (our SDKs which support both Developer API & Vertex support this through retry config).

If this still doesn’t solve the issue, you could be hitting internal quota limits. We’re planning to expose the cause of these starting in Q1 2026 in Gemini Developer API, I don’t know if there are any similar limits / plans in Vertex.

How can I be hitting these limits as a single user, there are no other people generating only me

I’m having the same problem. Single user creating a rapid prototype hitting a wall where I’m always getting 429’s.

Same problem for several days already. Prior that batch requests worked well with the same setup. Now I’m constantly getting 429 for gemini-2.5-flash (it became unusable), gemini-3-pro-preview, gemini-3flash-preview also.

We’ve started seeing this in the past 12 hours or so. Most requests go through just fine, but every so often a request gets rejected with a 429 error, despite us being well within our quota, and despite requests succeeding either side of the 429 error. We’re also seeing that backoff and retry isn’t working well, supporting the OP’s hypothesis that it’s something about the prompts themselves, not the account’s usage, that is causing these errors.

It seems that Google uses 429 to mean “our server is too busy right now”, instead of the standard 503 Service Unavailable: Standard PayGo  |  Generative AI on Vertex AI  |  Google Cloud Documentation

If you receive a 429 error, it doesn’t indicate that you’ve hit a fixed quota. It indicates temporary high contention for a specific shared resource.

In my opinion it is quite rude to return 429 in this instance, implying that it’s something that the user caused and has the power to fix. In fact, it may have nothing to do with what the user requested, and everything to do with Google just not having enough resources to service all its customers at the same time.

A cynic might assume that returning a 4xx error instead of a 5xx error is done to fudge the stats on uptime and availability… :thinking:

in a parallel thread a person from Google just posted that they noticed the problem and will escalate it internally. This was 7 hours ago.

Can you imagine? It’s been a month since first reports.

1 Like