Gemini 2.0 Flash API: Long Response Times and 503 GOAWAY Errors with PDF Base64 Input

Issue Summary

I’m experiencing intermittent timeout issues when calling the Gemini API using the gemini-2.0-flash model with PDF files encoded as base64. The timeouts occur after 600 seconds, followed by 503 GOAWAY errors, despite eventually receiving successful responses.

Technical Details

  • Model: gemini-2.0-flash
  • Input: PDF files sent as base64 encoded data
  • Framework: LangChain with LangSmith for monitoring
  • Error: RetryError: Timeout of 600.0s exceeded, last exception: 503 GOAWAY received
  • Observed Duration: ~950 seconds (logged in LangSmith) when issue occurs
  • Frequency: Intermittent - not every request

Request Pattern

Using LangChain to send a GenerateContent call to the Gemini API with PDF files converted to base64 and included in the request payload.

Key Observations

  1. Not quota-related: The errors don’t correlate with “GenerateContent input token count limit per model per minute” quota issues
  2. Eventually succeeds: Responses are eventually received despite the timeout errors
  3. Consistent duration: When the issue occurs, LangSmith consistently logs ~950 seconds duration
  4. PDF-specific: Issue seems to occur specifically with PDF base64 inputs
  5. Retry behavior: Most notably, when retrying the exact same PDF just a minute after the first request, it receives a response in just a few seconds as expected
  6. Not size-dependent: The issue doesn’t necessarily correlate with large prompts - it occurred, for example, with a request containing ~12,000 prompt tokens and generating only 49 completion tokens

Questions

  1. Is this a known issue with PDF processing in gemini-2.0-flash?
  2. Could this be related to PDF processing latency on Google’s backend?