I’m currently workin on an app and need to use context caching via REST.
I’m using the Gemini API and the documentation doesn’t cover how to use a cache.
In the vertex AI documentation, it is pretty clear, for example:
{
“cached_content”: “projects/PROJECT_NUMBER/locations/LOCATION/CACHE_ID”,
“contents”: [
{“role”:“user”,“parts”:[{“text”:“PROMPT_TEXT”}]}
],
How do I do this with the Gemini API?
I’ve tried using it like this without any luck:
{
“contents”: [
{
“role”: “user”,
“parts”: [
{
“text”: “User input here”
}
]
}
],
“cached_content”: “Cache reference here”
}
Update, that format is correct, I was able to get context caching to work via REST.
Only issue is it’s about 5x times slower than flash without caching, which is unfortunate, but nonetheless it works.
The above shows how to use an existing cache reference via REST. The “text” here is just the user input for a conversation that uses the cache reference. For example, say you wanted to have a conversation with a Gemini bot about a document or book. You could upload the book, document and/or any images/videos plus a system prompt to create a cache reference. That in turn would be used as shown above to give the Gemini bot context before the conversation even starts, it would have prior “knowledge”.
My post here was about how to use a cache reference via REST (since my application uses C++).
To learn how to create a cache reference, see the documentation and try out the code:
Ah I see thank you. I was hoping I could create a cache reference via REST as well, as you can do with Claude. Perhaps you can but there are no docs related to it
No problem!
Claude’s looks similar, but they it Prompt Caching and it seems it works a bit differently.
It’s in open beta so makes sense there isn’t much documentation yet, but I found this: