# gemini-3-pro-image-preview: 37% error rate due to content-based 429 RESOURCE_EXHAUSTED - blocks legitimate platform usage
## Environment
- **Model:** gemini-3-pro-image-preview
- **Endpoint:** Global (location: “global”)
- **SDK:** @google/genai (Node.js)
- **Auth:** Vertex AI service account
- **Quota page shows:** “Unlimited” for image generation requests
## Summary
We are building a content creation platform that uses gemini-3-pro-image-preview to generate images on behalf of users. Our platform constructs detailed prompts (including identity preservation instructions, style references, and editing instructions) so that users can generate professional content without needing to write prompts themselves.
We are experiencing a consistent 37% error rate on GenerateContent calls, all returning 429 RESOURCE_EXHAUSTED. Our quotas page shows “Unlimited” for image generation requests on the global endpoint, confirming this is not a standard quota issue. Out of 1,564 total GenerateContent requests, approximately 585 have failed with 429.
Additionally, we have spent over $500 on Vertex AI but are told we are “not eligible” to purchase paid support, leaving us with no way to get direct technical assistance.
## Detailed Findings
We have conducted extensive testing over several weeks to understand the exact behavior of the 429 errors. Here is what we have found:
### Finding 1: The 429 is triggered by repeated similar prompts
When we send the same prompt (for example, generating an image of a specific character) 3-5 times consecutively, the API starts returning 429 RESOURCE_EXHAUSTED. This happens regardless of how much time passes between requests. We tested waiting 24 hours between identical prompts and the 429 persisted on the next attempt with the same prompt.
### Finding 2: Different prompts sometimes work, sometimes don’t
After receiving a 429 for a given prompt, sending a completely different prompt succeeds approximately 50% of the time. This suggests there are two overlapping issues: content-based repetition detection AND general shared capacity constraints on the preview model.
### Finding 3: The detection is semantic-level, not text-level
We tested minor word-level changes to prompts (for example, changing “red apple” to “green apple”) and these did NOT bypass the detection. We also tried using Gemini Flash to completely rephrase prompts using entirely different wording, sentence structure, and vocabulary while preserving the same visual meaning. The rephrased prompts were also blocked. This indicates the detection operates on semantic similarity, not exact text matching.
### Finding 4: Adding metadata or unique identifiers does not help
We appended unique identifiers to each prompt (e.g., `[generation:unique-uuid]`) to make each request technically unique at the text level. This had no effect. The detection system appears to ignore structured metadata and fingerprint only the actual content.
### Finding 5: The detection is NOT project-scoped – it operates at the model infrastructure level
This is our most significant finding. We created a brand new GCP project with a completely separate service account that had never made a single API request. We linked it to the same billing account and enabled Vertex AI. The very first request from this new project, using a prompt that was previously blocked on our primary project, also returned 429 RESOURCE_EXHAUSTED.
This proves the detection is not per-project and not per-API-key. It operates at Google’s model serving backend, shared across all projects. A content fingerprint flagged on one project is blocked on all projects hitting the same backend infrastructure.
### Finding 6: Quota page shows “Unlimited” but 429s persist
The system limit for gemini-3.0-pro-image-preview_default_res on the global endpoint shows:
- Value: Unlimited
- Current usage: 560
- Adjustable: No
Despite “Unlimited” quota, 37% of requests fail. This confirms the 429s are not from hitting a quota ceiling.
## Why This Is a Problem for Legitimate Use Cases
Our platform serves users who:
1. **Iterate on character designs** – A user creates a character and generates multiple images in different poses, outfits, or scenarios. The prompts are semantically similar (same character description) with minor variations. This triggers the repetition detection after 3-5 generations.
2. **Re-generate when results aren’t satisfactory** – If the first generation doesn’t look right, users click “Generate” again with the same or slightly modified prompt. This is the most basic expected behavior on any image generation platform, and it triggers the block.
3. **Use identity preservation mode** – Our platform supports maintaining a consistent character identity across multiple generations. By definition, these prompts share significant semantic overlap because they describe the same person. The repetition detection treats this as spam.
4. **Edit existing images** – Users upload an image and request edits (change outfit, change background, etc.). The base prompt describing the image stays similar across edits, triggering the detection.
All of these are legitimate, expected use patterns for an image generation platform. The current behavior makes gemini-3-pro-image-preview unsuitable for any interactive image generation product where users iterate on their work.
## What We Have Tried (None of These Work)
| Approach | Result |
|—|—|
| Multiple GCP projects with separate service accounts | Same 429 – detection is cross-project at model backend |
| LLM-based semantic prompt rewriting (Gemini Flash) | Still caught by semantic fingerprinting |
| Adding unique UUID metadata to each prompt | System ignores metadata, fingerprints content only |
| Word-level prompt modifications | Semantic detection too robust |
| Exponential backoff with jitter (up to 12s delays) | Time alone does not reset the detection window |
| Waiting 24 hours between identical prompts | Still blocked until a genuinely different prompt is sent |
| Using the global endpoint | Already using it – same behavior |
## Technical Details of Our Implementation
- We send 1 image generation request at a time per user
- Maximum 2 concurrent requests per user, 10 concurrent globally
- Prompts range from 300 to 2,000 tokens
- Requests may include 0-3 reference images (for identity preservation or editing)
- We use exponential backoff with jitter on 429
- We implement project-based failover (primary project → fallback project → backoff → retry)
- Average successful request latency: ~22 seconds
- 99th percentile latency: ~2 minutes (for requests that eventually succeed after delays)
## Questions for the Vertex AI Team
1. Is there a way to whitelist a project or billing account to exempt it from the content-based repetition throttling? Our traffic is legitimate platform usage, not abuse.
2. What is the exact mechanism behind the content-based 429 behavior? Is it documented anywhere? We could not find any documentation describing semantic-level content fingerprinting causing 429 errors.
3. Is this behavior specific to the preview stage of gemini-3-pro-image-preview? Will it be relaxed when the model reaches GA?
4. What is the recommended architecture for platforms that generate similar content iteratively (character design tools, image editing tools, identity-consistent generation)?
5. Why does our quota page show “Unlimited” while 37% of requests fail with 429? What is the actual limiting factor?
6. We are spending $500+ on Vertex AI but are told we are “not eligible” for paid support. How can we get technical support as a paying customer?
## Reproduction Steps
1. Create a Vertex AI project with pay-as-you-go billing
2. Use gemini-3-pro-image-preview via the global endpoint
3. Send the same image generation prompt 3-5 times consecutively
4. Observe: after 3-5 requests, all subsequent requests with the same or semantically similar prompt return 429 RESOURCE_EXHAUSTED
5. Send a completely different prompt – it will succeed approximately 50% of the time
6. Return to the original prompt – it may work again after the pattern is broken
## Expected Behavior
Requests within quota limits (which show “Unlimited”) should succeed regardless of content similarity, as long as the caller is not exceeding rate limits. Content-based throttling, if it must exist, should be documented and should have configurable thresholds or an exemption process for legitimate platforms.
## Actual Behavior
37% of all GenerateContent requests fail with 429 RESOURCE_EXHAUSTED despite “Unlimited” quota. The failures correlate with semantic content similarity across consecutive requests and persist across separate GCP projects and service accounts.
-–
This is a blocking issue for our product launch. Any guidance or resolution would be greatly appreciated.