Some Embeddings are finished with 0B

miin1635_patsol · March 30, 2026, 1:26am

[Issue] Batch Embedding result file is 0 bytes after `JOB_STATE_SUCCEEDED`

Summary

When using the google-genai Python SDK, client.files.download() intermittently returns an empty bytes object (0 bytes) even after the Batch Embedding job status has reached JOB_STATE_SUCCEEDED. No exceptions are raised, but the downloaded file contains no data.

Step-by-Step Workflow

We are using the Gemini Batch Embedding API (asynchronous) to process large JSONL corpora. Our implementation follows these steps:

1. Input Preparation & Upload

We generate a JSONL file where each line follows the required Batch API structure. Note the use of the key field for metadata mapping:

{
  "key": "unique_chunk_id_001",
  "request": {
    "model": "models/gemini-embedding-001",
    "content": { "parts": [{ "text": "The text to embed..." }] },
    "taskType": "RETRIEVAL_DOCUMENT",
    "title": "Optional Document Title",
    "outputDimensionality": 3072
  }
}

2. Job Submission

The file is uploaded via client.files.upload() and the batch job is created:

job = client.batches.create_embeddings(
    model="models/gemini-embedding-001",
    src=types.EmbeddingsBatchJobSource(file_name=uploaded_file_name),
    config={"display_name": "batch_emb_run_v1"}
)

3. Polling & Success State

We poll client.batches.get() until the state is JOB_STATE_SUCCEEDED.

4. Result Download (The Failure Point)

Once successful, we retrieve the output file name from job.dest.file_name and attempt to download:

# job.state.name == "JOB_STATE_SUCCEEDED"
result_file_name = job.dest.file_name  # e.g., "files/output_123"
content: bytes = client.files.download(file=result_file_name)

# Result: len(content) is 0

Problem Description

Intermittent Occurrence: This does not happen every time. Some parts of the batch download perfectly, while others (running under the same conditions/projects) return 0 bytes.
No Exceptions: The download() call completes without 404, 403, or any SDK-level errors.
Valid Metadata: job.dest.file_name always contains a valid string path.
Empty Output: The returned bytes object is b"".

Environment

Key	Value
SDK	`google-genai` (Python)
Model	`models/gemini-embedding-001`
Request Schema	Wrapped in `"request"` field with `"key"` identifier
Parallelism	Multiple billing projects/API keys used concurrently

Questions to the Community

Consistency/Race Condition: Is it possible for the job state to hit SUCCEEDED before the File API has finished committing the result buffer to storage?
Pre-download Validation: Is it recommended to poll client.files.get(name=result_file_name) and check if size_bytes > 0 before calling download()?
Correct Endpoint: Is client.files.download() the standard way to fetch batch results, or is there a more robust method (e.g., streaming) recommended for embeddings?
Retry Strategy: Since the API thinks the call was successful (no error code), should we implement a manual retry loop specifically for 0-byte responses?

Any insights from the Google team or other developers who have faced this “empty success” issue would be greatly appreciated!

This post was written by AI.
Any helps will be helpful for me.

Topic		Replies	Views
Gemini API Empty Response Bug Report Gemini API bug , gemini	3	193	December 26, 2025
Critical Bug - Batch API Output Files Cannot Be Downloaded Due to 40-Character File ID Limit Gemini API gemini-api , gemini-flash	3	117	January 7, 2026
Gemini Batch API Critical Bug Results in Inaccessible Files Gemini API gemini-api , gemin-flash-image	1	88	January 7, 2026
Batch Gemini Embedding with VertexAI Gemini API gemini , vertex-ai	5	226	October 8, 2025
Persistent 503 Errors on Gemini Batch API GET Endpoint — Jobs Succeeded but Results Unretrievable Gemini API api , gemini	3	110	December 30, 2025