Has anyone gotten implicit caching to work?

Sven_Ostertag · April 28, 2026, 11:22am

I’ve never seen a gemini model cache something. I just sent the same 1900 token message to Gemini 3 Flash, with several seconds between each request, but i just never see anything in usageMetadata indicating caching.

mbusana · May 4, 2026, 6:51pm

I’ve been running benchmarks with native google-genai and found that the 1024 token threshold mentioned in the documentation is inaccurate. Your 1900 token prompt isn’t triggering caching because the actual activation floor is significantly higher.

In my tests, gemini-3-flash-preview and gemini-3.1-flash-lite-preview only showed the first cache hit at roughly 4192 tokens. For gemini-3.1-pro-preview, the threshold jumped to approximately 8161 tokens.

This behavior is problematic since the industry standard for prompt caching is typically 1k (max 4k). Currently, prompts in the 1k to 4k range are missing out on caching, leading to unnecessary costs, latency, and energy waste.

Sven_Ostertag · May 5, 2026, 4:01pm

Wow, that’s very unfortunate, thanks for sharing your findings.

Topic		Replies	Views
Flash implicit caching only works after 6k tokens vs the advertised 1k tokens Gemini API api , gemini-flash	1	235	July 2, 2025
Gemini 2.5 Flash implicit caching problem Gemini API api , context_caching	5	742	March 4, 2026
Gemini 2.5 Flash Lite: Implicit Caching Not Working Despite Meeting Documented Requirements Gemini API bug , gemini	1	388	March 4, 2026
Gemini 2.5 Flash Live Implicit Context Caching Not Working / Feedback Gemini API models , gemini	4	331	December 22, 2025
Implicit Context Caching does not work with Gemini 3 Pro Preview Gemini API context_caching	2	730	December 8, 2025

Has anyone gotten implicit caching to work?

Related topics