Gemini Batch Requests are Running Very Slowly and Timing Out

justin-gryps · September 12, 2025, 4:32pm

I am working with the Gemini Vertex Batch API, and running batch Gemini jobs on files in these batch requests, and have recently been running into issues.

For the past ~ 2 weeks our batch processes that run on files have slowed down tremendously, with all of the batches failing before the 24 hour window, leading to no/missing results. Before this batches would run with no issue, and batches of sizes ranging from 150,000 to 500 were able to run within the 24 hour window. Now almost all of them fail within the 24 hour execution window, and of the failed batch jobs very little of the rows have actually been run.

When looking at these outputs most of the requests hit a 429 Resource Exhausted error and there are also some 500 responses from the Gemini Batch API.

We noticed this issue originally when running on gemini-flash-2-lite, but after switching to gemini-flash-2.5-lite to see if there were more resources available it is still running into the same issue.

The Gemini batch documentation states: There are no predefined quota limits on your usage. Instead, batch service provides access to a large, shared pool of resources, dynamically allocated based on availability of resources and real-time demand across all customers of that model. See: Batch prediction with Gemini | Generative AI on Vertex AI | Google Cloud.

Since we are using DSQ for provisioning, we don’t have control over the 429 errors, but since the jobs time out in 24 hours, we also can’t just run the Gemini batch jobs slower to avoid overwhelming the DSQ.

If anyone else has encountered this issue or has any suggestions on a solution please advise.

justin-gryps · September 12, 2025, 4:37pm

Also to provide more context on the error outputs:

500 error

Internal error occurred. Failed to get generateContentResponse: {"error": {"code": 500, "message": "Internal error encountered.", "status": "INTERNAL"}}

429 Error

RESOURCE_EXHAUSTED error occurred: {"error": {"code": 429, "message": "Quota exceeded for quota metric 'Online prediction requests' and limit 'Online prediction requests per minute per region' of service 'aiplatform.googleapis.com' for consumer 'project_number:945369697288'.", "status": "RESOURCE_EXHAUSTED", "details": [{"@type": "type.googleapis.com/google.rpc.ErrorInfo", "reason": "RATE_LIMIT_EXCEEDED", "domain": "googleapis.com", "metadata": {"quota_unit": "1/min/{project}/{region}", "quota_limit": "OnlinePredictionRequestsPerMinutePerProjectPerRegion", "consumer": "projects/945369697288", "quota_location": "us-east1", "quota_metric": "aiplatform.googleapis.com/online_prediction_requests", "quota_limit_value": "90000", "service": "aiplatform.googleapis.com"}}, {"@type": "type.googleapis.com/google.rpc.Help", "links": [{"description": "Request a higher quota limit.", "url": "https://cloud.google.com/docs/quotas/help/request_increase"}]}]}}

Online prediction requests per minute per region metric:

Topic		Replies	Views
Gemini API Batch Mode - 429 error Gemini API api , gemini	3	367	August 25, 2025
Sudden Spike in 429 Errors with Gemini 2.5 via Vertex AI Global Endpoint Gemini API vertexai , gemini	3	784	January 29, 2026
Hitting rate limit on Gemini Batch API for gemini-embedding-001 Gemini API gemini-embedding	2	311	October 8, 2025
Issue with 429 Error on Gemini API Despite Staying Within Rate Limits Gemini API gemini-api	11	1556	December 18, 2025
Immediate 429 from batch embedding endpoint Gemini API api , gemini	6	122	January 13, 2026

Gemini Batch Requests are Running Very Slowly and Timing Out

Related topics