I’ve never seen a gemini model cache something. I just sent the same 1900 token message to Gemini 3 Flash, with several seconds between each request, but i just never see anything in usageMetadata indicating caching.
I’ve been running benchmarks with native google-genai and found that the 1024 token threshold mentioned in the documentation is inaccurate. Your 1900 token prompt isn’t triggering caching because the actual activation floor is significantly higher.
In my tests, gemini-3-flash-preview and gemini-3.1-flash-lite-preview only showed the first cache hit at roughly 4192 tokens. For gemini-3.1-pro-preview, the threshold jumped to approximately 8161 tokens.
This behavior is problematic since the industry standard for prompt caching is typically 1k (max 4k). Currently, prompts in the 1k to 4k range are missing out on caching, leading to unnecessary costs, latency, and energy waste.
Wow, that’s very unfortunate, thanks for sharing your findings.
