Rest api context caching

Hello, good day!

I’m currently workin on an app and need to use context caching via REST.
I’m using the Gemini API and the documentation doesn’t cover how to use a cache.
In the vertex AI documentation, it is pretty clear, for example:
{
“cached_content”: “projects/PROJECT_NUMBER/locations/LOCATION/CACHE_ID”,
“contents”: [
{“role”:“user”,“parts”:[{“text”:“PROMPT_TEXT”}]}
],

How do I do this with the Gemini API?
I’ve tried using it like this without any luck:
{
“contents”: [
{
“role”: “user”,
“parts”: [
{
“text”: “User input here”
}
]
}
],
“cached_content”: “Cache reference here”
}

Update, that format is correct, I was able to get context caching to work via REST.
Only issue is it’s about 5x times slower than flash without caching, which is unfortunate, but nonetheless it works.

2 Likes

Hi @deon , could you say more about that please:

  • does this code above cache the user text under the ‘cache reference’?
  • What if you want to cache multiple conversation turns?
  • And how do you use the cached content?

many thanks

The above shows how to use an existing cache reference via REST. The “text” here is just the user input for a conversation that uses the cache reference. For example, say you wanted to have a conversation with a Gemini bot about a document or book. You could upload the book, document and/or any images/videos plus a system prompt to create a cache reference. That in turn would be used as shown above to give the Gemini bot context before the conversation even starts, it would have prior “knowledge”.

My post here was about how to use a cache reference via REST (since my application uses C++).
To learn how to create a cache reference, see the documentation and try out the code:

Let me know if you have more questions, thanks.

Ah I see thank you. I was hoping I could create a cache reference via REST as well, as you can do with Claude. Perhaps you can but there are no docs related to it

No problem!
Claude’s looks similar, but they it Prompt Caching and it seems it works a bit differently.
It’s in open beta so makes sense there isn’t much documentation yet, but I found this: