Hi team,
We’re hitting persistent 503 UNAVAILABLE responses on the Gemini Developer API
in production. Workload is invoice PDF processing (document understanding).
Error
HTTP 503
{
“error”: {
“code”: 503,
“status”: “UNAVAILABLE”,
“message”: “This model is currently experiencing high demand. Spikes in
demand are usually temporary. Please try again later.”
}
}
Endpoint
POST https://generativelanguage.googleapis.com/v1beta/models/:generateContent
Setup
- Model: < gemini 3 flash preview>
- Billing: Paid tier enabled on the linked Cloud project
- API key created: Google AI Studio, owner account
- Region of caller:
Observed
- First seen (UTC): 23-05-2026
- Frequency: < ~30% of requests over the last 1h>
- Payload: PDF, ~ pages, ~ MB
- Retry result: Exponential backoff (1/2/4s) still fails
Already tried
- Switching model: <gemini-2.5-pro → gemini-2.5-flash>, result: <…>
- Confirmed not hitting RPM/TPM quota
- Verified API key + billing on the project
Business impact
Production invoice pipeline degraded, manual fallback in place.
Asks
- Is this capacity-scoped (regional) or project-scoped?
- Guidance on a stable model for paid tier in this region.
- Any path to reserved/provisioned capacity on the Developer API,
or should we migrate this workload to Vertex AI?