[503 UNAVAILABLE] Persistent high-demand errors in production — invoice PDF processing

Hi team,

We’re hitting persistent 503 UNAVAILABLE responses on the Gemini Developer API
in production. Workload is invoice PDF processing (document understanding).

Error

HTTP 503
{
“error”: {
“code”: 503,
“status”: “UNAVAILABLE”,
“message”: “This model is currently experiencing high demand. Spikes in
demand are usually temporary. Please try again later.”
}
}

Endpoint

POST https://generativelanguage.googleapis.com/v1beta/models/:generateContent

Setup

  • Model: < gemini 3 flash preview>
  • Billing: Paid tier enabled on the linked Cloud project
  • API key created: Google AI Studio, owner account
  • Region of caller:

Observed

  • First seen (UTC): 23-05-2026
  • Frequency: < ~30% of requests over the last 1h>
  • Payload: PDF, ~ pages, ~ MB
  • Retry result: Exponential backoff (1/2/4s) still fails

Already tried

  • Switching model: <gemini-2.5-pro → gemini-2.5-flash>, result: <…>
  • Confirmed not hitting RPM/TPM quota
  • Verified API key + billing on the project

Business impact

Production invoice pipeline degraded, manual fallback in place.

Asks

  1. Is this capacity-scoped (regional) or project-scoped?
  2. Guidance on a stable model for paid tier in this region.
  3. Any path to reserved/provisioned capacity on the Developer API,
    or should we migrate this workload to Vertex AI?