System instruction and implicit caching question

Hey everyone,

I’m building a product using the Gemini API, and I’m really hoping to leverage implicit caching to reduce the (very) high API costs. However, there’s not much detailed documentation about how it actually works, so I wanted to ask here in case anyone knows.

Specifically — does the system instruction (the part that’s fixed at the beginning of the prompt) count as part of what’s being cached implicitly? Or is it treated separately and excluded from implicit caching?

Any clarification would be super appreciated. Thanks!

2 Likes

Hi @komin ,

Implicit caching is enabled by default for all Gemini 2.5 models The system instruction counts as part of the cached prefix,
Please refer to- https://ai.google.dev/gemini-api/docs/caching?lang=node#implicit-caching.

Let me know if you have any further questions.

1 Like

Hi, thanks for your answer. Does Gemini 2.5 Pro require at least 4,096 tokens or 2,048 tokens for implicit caching to work? I’ve seen some documents mentioning 2,048 and others mentioning 4,096. Also, are there any troubleshooting steps I can take if I’ve already met all the requirements but implicit caching still doesn’t seem to activate? That seems to be my case.
I might have to use explicit caching if there are no troubleshooting available.
Thanks a lot.

Hi @komin ,

The minimum input token count for context caching is listed in the following table for each model:

Model Min token limit
3 Pro Preview 2048
2.5 Pro 4096
2.5 Flash 1024

Please let us know if you have any further queries.

1 Like