I am using the following format
{'cachedContent': 'projects/*******/locations/us-central1/cachedContents/******',
'contents': [{'parts': [{'text': '.'}], 'role': 'user'}],
'generationConfig': {'candidateCount': 1,
'maxOutputTokens': 65534,
'temperature': 0,
'topP': 0.95},
'safetySettings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT',
'threshold': 'OFF'},
{'category': 'HARM_CATEGORY_HATE_SPEECH', 'threshold': 'OFF'},
{'category': 'HARM_CATEGORY_HARASSMENT', 'threshold': 'OFF'},
{'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'threshold': 'OFF'}]}
But I get this error
Bad Request: {"error": {"code": 400, "message": "Model gemini-2.5-flash-001 does not support cached content with batch prediction.", "status": "INVALID_ARGUMENT"}}
Hi @Shreyansh_Bardia,
Batch prediction does not support explicit caching. Batch prediction is optimized for high-throughput, asynchronous processing of a large number of prompts at a reduced cost, reference (refer to “Why use batch prediction?”); while explicit context caching is designed for real-time or near-real-time scenarios where you want to manually manage and reuse a large context across multiple individual requests to reduce latency and cost.
Okay, but It would be great if we could include caching in Batch Prediction as well, we have a prompt of around 10k tokens which we would like to run with different inputs. Caching would help in such scenarios
Sure @Shreyansh_Bardia,
I will raise this as a feature request to the concerned team. Thanks for providing your use-case as an example. If you would like to elaborate on this example based on your use-case and cost saving estimates with explicit context caching with batch, it will help the team.