Gemini Batch Requests are Running Very Slowly and Timing Out

,

I am working with the Gemini Vertex Batch API, and running batch Gemini jobs on files in these batch requests, and have recently been running into issues.

For the past ~ 2 weeks our batch processes that run on files have slowed down tremendously, with all of the batches failing before the 24 hour window, leading to no/missing results. Before this batches would run with no issue, and batches of sizes ranging from 150,000 to 500 were able to run within the 24 hour window. Now almost all of them fail within the 24 hour execution window, and of the failed batch jobs very little of the rows have actually been run.

When looking at these outputs most of the requests hit a 429 Resource Exhausted error and there are also some 500 responses from the Gemini Batch API.

We noticed this issue originally when running on gemini-flash-2-lite, but after switching to gemini-flash-2.5-lite to see if there were more resources available it is still running into the same issue.

The Gemini batch documentation states: There are no predefined quota limits on your usage. Instead, batch service provides access to a large, shared pool of resources, dynamically allocated based on availability of resources and real-time demand across all customers of that model. See: Batch prediction with Gemini  |  Generative AI on Vertex AI  |  Google Cloud.

Since we are using DSQ for provisioning, we don’t have control over the 429 errors, but since the jobs time out in 24 hours, we also can’t just run the Gemini batch jobs slower to avoid overwhelming the DSQ.

If anyone else has encountered this issue or has any suggestions on a solution please advise.

Also to provide more context on the error outputs:

500 error

Internal error occurred. Failed to get generateContentResponse: {"error": {"code": 500, "message": "Internal error encountered.", "status": "INTERNAL"}}

429 Error

RESOURCE_EXHAUSTED error occurred: {"error": {"code": 429, "message": "Quota exceeded for quota metric 'Online prediction requests' and limit 'Online prediction requests per minute per region' of service 'aiplatform.googleapis.com' for consumer 'project_number:945369697288'.", "status": "RESOURCE_EXHAUSTED", "details": [{"@type": "type.googleapis.com/google.rpc.ErrorInfo", "reason": "RATE_LIMIT_EXCEEDED", "domain": "googleapis.com", "metadata": {"quota_unit": "1/min/{project}/{region}", "quota_limit": "OnlinePredictionRequestsPerMinutePerProjectPerRegion", "consumer": "projects/945369697288", "quota_location": "us-east1", "quota_metric": "aiplatform.googleapis.com/online_prediction_requests", "quota_limit_value": "90000", "service": "aiplatform.googleapis.com"}}, {"@type": "type.googleapis.com/google.rpc.Help", "links": [{"description": "Request a higher quota limit.", "url": "https://cloud.google.com/docs/quotas/help/request_increase"}]}]}}

Online prediction requests per minute per region metric: