System instruction and implicit caching question

komin · November 4, 2025, 10:18am

Hey everyone,

I’m building a product using the Gemini API, and I’m really hoping to leverage implicit caching to reduce the (very) high API costs. However, there’s not much detailed documentation about how it actually works, so I wanted to ask here in case anyone knows.

Specifically — does the system instruction (the part that’s fixed at the beginning of the prompt) count as part of what’s being cached implicitly? Or is it treated separately and excluded from implicit caching?

Any clarification would be super appreciated. Thanks!

Mrinal_Ghosh · November 10, 2025, 6:50am

Hi @komin ,

Implicit caching is enabled by default for all Gemini 2.5 models The system instruction counts as part of the cached prefix,
Please refer to- https://ai.google.dev/gemini-api/docs/caching?lang=node#implicit-caching.

Let me know if you have any further questions.

komin · November 11, 2025, 8:20am

Hi, thanks for your answer. Does Gemini 2.5 Pro require at least 4,096 tokens or 2,048 tokens for implicit caching to work? I’ve seen some documents mentioning 2,048 and others mentioning 4,096. Also, are there any troubleshooting steps I can take if I’ve already met all the requirements but implicit caching still doesn’t seem to activate? That seems to be my case.
I might have to use explicit caching if there are no troubleshooting available.
Thanks a lot.

Mrinal_Ghosh · November 26, 2025, 8:30am

Hi @komin ,

The minimum input token count for context caching is listed in the following table for each model:

Model	Min token limit
3 Pro Preview	2048
2.5 Pro	4096
2.5 Flash	1024

Please let us know if you have any further queries.

Arsh_D_Vijay · March 5, 2026, 9:32pm

Hey @Mrinal_Ghosh just want to confirm that if systemInstruction is treated as separate entity or not?

I am using Gemini 2.5 Flash in JAVA setup; Currently the order of my request structure looks like :

content [ List of user and model Parts] [ Older to Latest manner]
systemInstruction [ List of system instruction parts]
generation config [ temp, seed etc]
tools [tools info]

Does that mean in whichever order we send the JSON request LLM reads it as a String; Thus implicit caching won’t happen on system instruction as before this we have context which is dynamic ?

In my understanding SystemInstructions are preamble to prompt thus it should gets cached irrespective of its position in request?
Does request structure order matters?

Topic		Replies	Views
Does OpenAI compatible format support implicit caching Gemini API api	3	155	May 13, 2025
Implicit Caching not Working on Gemini 2.5 Pro Gemini API gemini-2-5 , context_caching	3	572	June 16, 2025
Gemini 2.5 Flash implicit caching problem Gemini API api , context_caching	5	608	March 4, 2026
Have anyone checked out the implicit caching for gemini api, caches hits are inconsistent for me Gemini API gemini-api , gemini-2-5	7	567	June 13, 2025
Gemini 2.5 Flash Live Implicit Context Caching Not Working / Feedback Gemini API models , gemini	4	239	December 22, 2025

System instruction and implicit caching question

Related topics