Does context caching work on batch api requests? I have it working perfectly in my online requests.
However, when I try to do the same, using a jsonl file uploaded to gcs, gemini is not using the context I provide, am I doing something wrong? Or is it just not enabled yet?
For batch jobs, each line in your JSONL file represents a single request. The cachedContent must be a peer to the contents and generationConfig fields.
To fix this, you’ll need to adjust how you construct the request dictionary in your Python script. The change is to add the cachedContent key directly to the request_data dictionary.
import uuid
import json
# Assuming 'list_prompts', 'list_cache', and 'config' are defined
jsonl_data = []
for prompt, cache in zip(list_prompts, list_cache):
request_data = {
"contents": [
{
"role": "user",
"parts": [{"text": prompt}],
}
],
"generationConfig": config.gemini_config.to_json_dict(),
}
# Add the cachedContent field at the top level of the request if a cache exists
if cache:
request_data["cachedContent"] = cache.name
jsonl_entry = {
"key": str(uuid.uuid4()),
"request": request_data,
}
jsonl_data.append(jsonl_entry)
# To create the JSONL file content:
# jsonl_content = "\n".join(json.dumps(entry) for entry in jsonl_data)
# print(jsonl_content)
By making this change, your batch job should run correctly with cached context for each request.
Bad Request: {“error”: {“code”: 400, “message”: “Model gemini-2.5-flash-001 does not support cached content with batch prediction.”, “status”: “INVALID_ARGUMENT”}}
Context Caching is not currently supported with the Batch API. You can refer to the Context caching overview and limitation for more details on these features.