Context caching - batch api requests

Kailegh · September 19, 2025, 10:48am

Does context caching work on batch api requests? I have it working perfectly in my online requests.
However, when I try to do the same, using a jsonl file uploaded to gcs, gemini is not using the context I provide, am I doing something wrong? Or is it just not enabled yet?

I add the cache to the config like this

    jsonl_data = []
    for prompt, cache in zip(list_prompts, list_cache):
        config.gemini_config.cached_content = cache.name if cache else None
        jsonl_entry = {
            "key": str(uuid.uuid4()),
            "request": {
                "contents": [
                    {
                        "role": "user",
                        "parts": [{"text": prompt}],
                    }
                ],
                "generationConfig": config.gemini_config.to_json_dict(),
                # "system_instruction": {"parts": [{"text": config.system_instruction}]},
            },
        }
        jsonl_data.append(jsonl_entry)

Being the config this kind of object:

        self.gemini_config = GenerateContentConfig(
            temperature=self.temperature,
            max_output_tokens=8000,
            thinking_config=ThinkingConfig(
                include_thoughts=False,
                thinking_budget=0
            ),
        )

Thanks a lot

Krish_Varnakavi1 · September 20, 2025, 1:18am

Hi @Kailegh,

Correcting the JSONL Structure

For batch jobs, each line in your JSONL file represents a single request. The cachedContent must be a peer to the contents and generationConfig fields.


{
    "key": "your-unique-key",
    "request": {
        "contents": [
            {
                "role": "user",
                "parts": [
                    {
                        "text": "Your prompt goes here."
                    }
                ]
            }
        ],
        "generationConfig": {
            "temperature": 0.7,
            "max_output_tokens": 8000
        },
        "cachedContent": "cachedContents/your-cache-name"
    }
}

Revised Python Code

To fix this, you’ll need to adjust how you construct the request dictionary in your Python script. The change is to add the cachedContent key directly to the request_data dictionary.


import uuid
import json

# Assuming 'list_prompts', 'list_cache', and 'config' are defined
jsonl_data = []
for prompt, cache in zip(list_prompts, list_cache):
    request_data = {
        "contents": [
            {
                "role": "user",
                "parts": [{"text": prompt}],
            }
        ],
        "generationConfig": config.gemini_config.to_json_dict(),
    }
    
    # Add the cachedContent field at the top level of the request if a cache exists
    if cache:
        request_data["cachedContent"] = cache.name

    jsonl_entry = {
        "key": str(uuid.uuid4()),
        "request": request_data,
    }
    
    jsonl_data.append(jsonl_entry)

# To create the JSONL file content:
# jsonl_content = "\n".join(json.dumps(entry) for entry in jsonl_data)
# print(jsonl_content)

By making this change, your batch job should run correctly with cached context for each request.

Kailegh · September 22, 2025, 6:54am

Thank you for your answer, but are you sure that works?
I am sending requests with jsonl files like this:

{“key”: “582a4f0b-b303-41c1-80af-7ae89648dcdc”, “request”: {“contents”: [{“role”: “user”, “parts”: [{“text”: "\n \n \n<conversation id="25456_687240977_2020-09-25 13-43" number="1">{REAL TEXTS HERE}\n\n JSON Response:\n "}]}], “generationConfig”: {“temperature”: 0.0, “max_output_tokens”: 8000, “thinking_config”: {“include_thoughts”: false, “thinking_budget”: 0}}, “cachedContent”: “projects/334942433169/locations/europe-west4/cachedContents/5751646479966011392”}}

And I am still getting no proper aswer (like the one I get with online), that cache content is valid and tested with online mode (no batch)

Shreyansh_Bardia · October 10, 2025, 5:45pm

Hi @Krish_Varnakavi1

I am trying the format you have mentioned above but it’s still failing. Can you help me with this please

{'cachedContent': 'projects/******/locations/us-central1/cachedContents/*******',
 'contents': [{'parts': [{'text': '.'}], 'role': 'user'}],
 'generationConfig': {'candidateCount': 1,
  'maxOutputTokens': 65534,
  'temperature': 0,
  'topP': 0.95},
 'safetySettings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT',
   'threshold': 'OFF'},
  {'category': 'HARM_CATEGORY_HATE_SPEECH', 'threshold': 'OFF'},
  {'category': 'HARM_CATEGORY_HARASSMENT', 'threshold': 'OFF'},
  {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'threshold': 'OFF'}]}

Bad Request: {“error”: {“code”: 400, “message”: “Model gemini-2.5-flash-001 does not support cached content with batch prediction.”, “status”: “INVALID_ARGUMENT”}}

Mahesh_Sutar · November 25, 2025, 6:09am

Hello

Welcome to the forum!!

Context Caching is not currently supported with the Batch API. You can refer to the Context caching overview and limitation for more details on these features.

Topic		Replies	Views
Does Batch API in Vertex AI support caching? Gemini API api , gemini	3	218	November 5, 2025
How do I send cached content to a batch job?! Gemini API api	1	220	June 17, 2025
Any attempt to use cached context results in 500 Internal Server error Gemini API api , model	2	316	September 14, 2025
Is context caching with batch API not supported? Gemini API vertexai , context_caching	4	272	August 7, 2025
Implicit Caching not Working on Gemini 2.5 Pro Gemini API gemini-2-5 , context_caching	3	514	June 16, 2025

Context caching - batch api requests

Correcting the JSONL Structure

Revised Python Code

Related topics