Hi Gemini API Team,
We are running a production image-restyling SaaS and are currently seeing critical reliability issues with the gemini-3-pro-image-preview model. Our system is currently forced into a 6-step fallback chain because the primary API calls are failing.
1. 100% Failure Rate (HTTP 400) via Direct API
Every single request to the direct generativelanguage endpoint fails when including an image, even though text-only calls with the same key succeed.
-
Endpoint:
https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent -
Payload:
contents[0].parts=[{text: ~19KB}, {inlineData: {mimeType: "image/png", data: ~1-2MB}}] -
Result: 100%
400 INVALID_ARGUMENT(20/20 latest attempts). -
Observation: The same payload succeeds ~75% of the time when routed through a third-party gateway (OpenAI-compatible proxy).
-
Question: Is there a strict (undocumented) token or byte limit for the combined text+image payload on this specific preview model?
2. Intermittent “Text-only” Responses (HTTP 200, but no image)
When the model does respond via gateway, it frequently fails to generate an image modality.
-
Behavior: HTTP 200 OK, but the response contains only text reasoning (describing what it would have generated).
-
Frequency: ~25% of successful calls.
-
Question: Is there a parameter to force image output? We’ve tested
responseModalitiesin different orders (TEXT, IMAGE vs IMAGE, TEXT) without consistent results. Is there arequired_modalitiesor similar flag?
3. Vertex AI Availability & Consistency
We use gemini-3.1-pro-preview successfully on Vertex AI for analysis, but image generation for gemini-3-pro-image-preview seems unavailable or broken on Vertex.
-
Question: Is this model officially supported for image generation on
aiplatform.googleapis.com? If so, what is the exact payload structure? -
Latency: We are seeing validation timeouts (~35s) on Vertex. Is there a recommended timeout setting for multimodal validation tasks?
4. Specific Technical Requests
To stabilize our production environment, we need clarification on:
-
Token Budget: How does a ~2MB inline image calculate against the token limit for this model?
-
Temperature: Is there a recommended temperature (e.g., 0.8) to maximize image generation success vs. text reasoning?
-
Rate Limits: What are the specific RPM/TPM limits for the
image-previewmodel? -
Error Verbosity: Can you provide more detail for
INVALID_ARGUMENT? The current error body is non-actionable.
Impact: Our current workaround takes 50-60s due to multiple fallback steps. We need a reliable “Image In → Image Out” flow to maintain our production SLA.
We can provide Request IDs and full JSON payloads for your internal debugging.
Best regards,
Nichlas