Implicit Caching not Working on Gemini 2.5 Pro

Emir_Arditi · June 12, 2025, 10:44am

We have started trying out the implicit caching behavior on gemini 2.5 pro models. Our system prompt is over 2048 tokens and it is always static. Even when we send 2 requests after each other(around 10 seconds in total, 4 + 2 sec gap + 4), the cache_read property on usage_metadata is 0. The documents state that we don’t need any parameter setting etc. to enable this, but it looks like this is not the case for us. The model we are using is gemini-2.5-pro-05-06.

Deepakishore · June 12, 2025, 11:17am

Hey Emir,

Thanks for reaching out and detailing your experience with implicit caching on Gemini 2.5 Pro. I understand how frustrating it can be when a feature isn’t behaving as expected, especially with a static, large prompt like yours.

It’s interesting that usage_metadata.cache_read is returning 0 even after multiple sequential requests. You’re absolutely right that the documentation suggests that implicit caching should “just work” without explicit parameter settings for static contexts, which your setup clearly fits.
To help us dig into this a bit more effectively, could you clarify a couple of things?

1.“Static” prompt: You mentioned your prompt is “static.” Does this mean the entire prompt, including user input within the conversation, remains unchanged across those two requests, or just the system prompt portion?

2.Request content: Are the two requests identical in terms of the prompt structure and content, or are there any subtle differences in the user turns or other elements?

3.API client/SDK: Are you using a specific Google-provided client library (e.g., Python SDK, Node.js SDK) or making direct REST API calls? If so, which version?

4.Full prompt structure (anonymized): Without revealing sensitive info, could you give a high-level idea of how your prompt is structured? For example, is it a single system message, or are there multiple parts?

Also, just to confirm, you’re using gemini-2.5-pro-05-06, which is great to know.

We’re actively working to improve the caching mechanisms, and your detailed feedback is invaluable. We appreciate your patience as we look into this!

Thanks.

Emir_Arditi · June 16, 2025, 7:08am

Hello Deepakishore,

The prompt is fully static, the entire system prompt and the user prompt is same between 2 requests.
They are fully the same.
We are using the langchain google-genai with the following versions:

langchain 0.3.24
langchain-google-genai 2.1.5
google-genai 1.20.0

It is a single long system prompt which instruct the agent regarding how to act. The output is using with_structured_output, which acts as a forced tool call on gemini models on langchain.

Deepakishore · June 16, 2025, 10:16am

Hi @Emir_Arditi,

Thank you so much for the detailed and clear answers to my questions! This information is incredibly helpful in understanding the context of the implicit caching issue you’re experiencing.

To recap, you’ve confirmed that:

Your entire prompt (both system and user parts) is indeed static and identical across sequential requests.

You’re specifically using langchain-google-genai with versions langchain 0.3.24, langchain-google-genai 2.1.5, and google-genai 1.20.0.

The prompt itself is a long system prompt, and you’re leveraging _structured_output for a forced tool call.

This is a really interesting edge case, especially with the use of _structured_output and the LangChain integration. While implicit caching is designed to handle static prompts, there might be specific interactions with structured output or the way these libraries handle the request payload that could be affecting its behavior

We’re actively looking into how _structured_output and tool calling might interact with our caching mechanisms, and your use case is providing valuable data. We really appreciate your patience and cooperation as we debug this.

Thanks again for the excellent detail!

Topic		Replies	Views
Implicit Caching: Gemini 2.5 Pro Preview 05-06 Gemini API context_caching , gemini_25_pro	3	364	June 25, 2025
Implicit Caching Not Working for Gemini-2.5-Pro with 30k+ Tokens Despite Documentation Requirements Gemini API api , prompt	2	152	September 3, 2025
System instruction and implicit caching question Gemini API api , context_caching	3	130	November 26, 2025
Gemini 2.5 Flash implicit caching problem Gemini API api , context_caching	4	461	July 13, 2025
Can't get the implicit caching to work Gemini API api , python	3	251	September 15, 2025

Implicit Caching not Working on Gemini 2.5 Pro

Related topics