We are using the Gemini Image API in a production SaaS platform.
When generating high-resolution images (2K–4K), we frequently receive:
503 – “The model is overloaded. Please try again later.”
This happens even at low concurrency (1 request at a time).
The same request sometimes works locally or at different times of day, which suggests capacity variability.
Here is a screenshot from our production logs:
(attach the same screenshot you sent to support)
Before redesigning our pipeline, we would like official clarification on:
Is 4K image generation considered best-effort rather than guaranteed?
What resolution is recommended for stable production usage (e.g., 1024px / 1536px)?
Does upgrading the service tier improve queue priority for high-resolution image requests?
Any guidance from the Gemini engineering team would be greatly appreciated.
4K generation is considered a supported capability, but its availability is governed by Dynamic Capacity Management. Because 4K output requires significantly higher compute resources and longer reasoning cycles (part of the Gemini 3 “Thinking” process), the API may return a 503 Service Unavailable error during peak periods even if you have not exceeded your RPM/TPM limits. Gemini Error Codes .
For SaaS platforms requiring consistent low-latency responses, it’s recommended to use Standard High Definition rather than Ultra HD (4K). Media Resolution
Upgrading from Tier 1 to Tier 3 increases your rate limits (RPM/TPM), but more importantly, it grants access to Provisioned Throughput options.
Based on these, you can try implementing retry logic with Exponential Backoff for handling 503 errors and also have a resolution fallback to a lower (2k) resolution if the request fails multiple times.