I’m on Gemini Tier 1 and I’ve been trying to utilize the gemini-embedding-001 model in my project which requires me to embed large amounts of text. I’ve been constantly getting rate limited and I don’t know what’s causing it. Nowhere in the docs can I find any limits mentioned for the gemini-embedding-001 model for the Gemini Batch API.
I checked both the Google Cloud Console and AI Studio - nowhere does it say I’m being limited. AI Studio tells me I’m at 2/3K RPM and 132/1M TPM for gemini-embedding-001, but I’m guessing this is different from the batch async endpoint I’m trying to utilize (asyncBatchEmbedContent), which is supposed to have much higher limits.
What I’ve observed:
-
First request of the day gets 429: After not making any requests for 14+ hours, with 0 pending batch jobs, the very first request to asyncBatchEmbedContent returned 429 Too Many Requests.
-
Inconsistent token limits: A request with ~245,000 tokens (500 chunks) got 429. After reducing to ~131,000 tokens (300 chunks), it passed. But subsequent requests with ~113,000 tokens still got 429. (I estimated tokens by dividing by 4, which isn’t exact but close enough).
-
Appears to be both request-count based and token-based: After 1-2 successful batch job creations, subsequent requests get 429 regardless of token count. The limit seems to reset after ~15-20 minutes.
-
Google Batch API returns NO indication about the error, there’s no Retry-After or anything similar in the headers.
2026-01-13 15:56:06,672 - INFO - Starting the pipeline…
2026-01-13 15:56:08,480 - INFO - Found 500 chunks to embed
2026-01-13 15:56:08,480 - INFO - Splitting into 1 batch(es) of up to 500 chunks each
2026-01-13 15:56:08,610 - INFO - Batch stats: 500 chunks, 980,121 chars, ~245,030 tokens (estimated)
2026-01-13 15:56:09,226 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/upload/v1beta/files “HTTP/1.1 200 OK”
2026-01-13 15:56:11,017 - INFO - File uploaded: https.://generativelanguage.googleapis.com/v1beta/files/[FILE_ID_REDACTED]
2026-01-13 15:56:11,017 - INFO - Single batch prepared: 500 chunks, ~245,030 tokens
2026-01-13 15:56:11,017 - INFO - Creating batch embedding job…
2026-01-13 15:56:11,017 - INFO - Using resource name: files/[FILE_ID_REDACTED]ExperimentalWarning: batches.create_embeddings() is experimental and may change without notice.
job = self.client.batches.create_embeddings(…)2026-01-13 15:56:12,095 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 429 Too Many Requests”
2026-01-13 15:56:12,098 - WARNING - Rate limited on create embedding batch (attempt 1/11). Using backoff: 16.6s
2026-01-13 15:56:29,914 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 429 Too Many Requests”
2026-01-13 15:56:29,917 - WARNING - Rate limited on create embedding batch (attempt 2/11). Using backoff: 59.5s
2026-01-13 15:57:30,636 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 429 Too Many Requests”
2026-01-13 15:57:30,637 - WARNING - Rate limited on create embedding batch (attempt 3/11). Using backoff: 113.3s
2026-01-13 15:59:25,132 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 429 Too Many Requests”
2026-01-13 15:59:25,199 - WARNING - Rate limited on create embedding batch (attempt 4/11). Using backoff: 139.7s
**
Second logs:**
2026-01-13 16:06:54,881 - INFO - Found 500 chunks to embed
2026-01-13 16:06:54,881 - INFO - Splitting into 2 batch(es) of up to 300 chunks each
2026-01-13 16:06:54,977 - INFO - Batch submission plan: 2 batches, 30s delay between submissions
2026-01-13 16:06:55,087 - INFO - Batch stats: 300 chunks, 527,051 chars, ~131,762 tokens (estimated)
2026-01-13 16:06:59,148 - INFO - File uploaded: https.://generativelanguage.googleapis.com/v1beta/files/[FILE_ID_1]
2026-01-13 16:06:59,149 - INFO - Creating batch embedding job…
2026-01-13 16:07:01,003 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 200 OK”
2026-01-13 16:07:01,004 - INFO - Batch embedding job created: batches/[BATCH_ID_1]
2026-01-13 16:07:01,142 - INFO - Child batch 1/2 submitted (~131,762 tokens)
2026-01-13 16:07:01,142 - INFO - Waiting 30s before submitting batch 2/2…2026-01-13 16:07:31,296 - INFO - Preparing embedding batch for 200 chunks
2026-01-13 16:07:31,308 - INFO - Batch stats: 200 chunks, 453,070 chars, ~113,267 tokens (estimated)
2026-01-13 16:07:34,019 - INFO - File uploaded: https.://generativelanguage.googleapis.com/v1beta/files/[FILE_ID_2]
2026-01-13 16:07:34,019 - INFO - Creating batch embedding job…
2026-01-13 16:07:34,859 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 429 Too Many Requests”
2026-01-13 16:07:34,860 - WARNING - Rate limited on create embedding batch (attempt 1/11). Using backoff: 20.8s
2026-01-13 16:07:56,811 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 429 Too Many Requests”
2026-01-13 16:07:56,813 - WARNING - Rate limited on create embedding batch (attempt 2/11). Using backoff: 46.8s
2026-01-13 16:08:45,488 - INFO - HTTP Request: POST https.://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:asyncBatchEmbedContent “HTTP/1.1 200 OK”
2026-01-13 16:08:45,491 - INFO - Batch embedding job created: batches/[BATCH_ID_2]
2026-01-13 16:08:45,745 - INFO - Child batch 2/2 submitted (~113,267 tokens)
I’ve been searching for solution for days at this point and I couldn’t find this anywhere. I’ve also requested rate limit increases few weeks ago and got no response. Can anyone please help me / let me know what’s causing this and how can I fix? It is completely unusable at this point.