Rest api context caching

deon · July 15, 2024, 12:18am

Hello, good day!

I’m currently workin on an app and need to use context caching via REST.
I’m using the Gemini API and the documentation doesn’t cover how to use a cache.
In the vertex AI documentation, it is pretty clear, for example:
{
“cached_content”: “projects/PROJECT_NUMBER/locations/LOCATION/CACHE_ID”,
“contents”: [
{“role”:“user”,“parts”:[{“text”:“PROMPT_TEXT”}]}
],

How do I do this with the Gemini API?
I’ve tried using it like this without any luck:
{
“contents”: [
{
“role”: “user”,
“parts”: [
{
“text”: “User input here”
}
]
}
],
“cached_content”: “Cache reference here”
}

deon · July 15, 2024, 12:58am

Update, that format is correct, I was able to get context caching to work via REST.
Only issue is it’s about 5x times slower than flash without caching, which is unfortunate, but nonetheless it works.

ethantn · August 19, 2024, 1:23pm

Hi @deon , could you say more about that please:

does this code above cache the user text under the ‘cache reference’?
What if you want to cache multiple conversation turns?
And how do you use the cached content?

many thanks

deon · August 19, 2024, 3:53pm

The above shows how to use an existing cache reference via REST. The “text” here is just the user input for a conversation that uses the cache reference. For example, say you wanted to have a conversation with a Gemini bot about a document or book. You could upload the book, document and/or any images/videos plus a system prompt to create a cache reference. That in turn would be used as shown above to give the Gemini bot context before the conversation even starts, it would have prior “knowledge”.

My post here was about how to use a cache reference via REST (since my application uses C++).
To learn how to create a cache reference, see the documentation and try out the code:

Let me know if you have more questions, thanks.

ethantn · August 19, 2024, 5:06pm

Ah I see thank you. I was hoping I could create a cache reference via REST as well, as you can do with Claude. Perhaps you can but there are no docs related to it

deon · August 19, 2024, 11:05pm

No problem!
Claude’s looks similar, but they it Prompt Caching and it seems it works a bit differently.
It’s in open beta so makes sense there isn’t much documentation yet, but I found this:

Topic		Replies	Views
Need a Rest API exemple of how to use Context caching Gemini API api	2	86	June 23, 2024
How to cache conversation like OpenAI/Claude/DeepSeek? Gemini API help-request	2	149	May 14, 2025
Is context caching with batch API not supported? Gemini API vertexai , context_caching	2	47	June 25, 2025
Does the OpenAI format support context caching? Gemini API open-models , ai	4	160	May 1, 2025
Did My Vertex AI Input Caching Fail? Gemini API help-request , generative-ai	2	68	May 2, 2025

Rest api context caching

Related topics