Unfortunately over the past few days, we have been receiving increased spending and latency alerts due to caching no longer working. Our data clearly shows that it was working up until last week but has not worked for us since.
I can confirm that the requests are at least 2048 tokens and that caching is working for the Gemini 2.5 Pro and Flash models. Is there a way to get this to work again for gemini 3?
Hey! For 3 Pro, the min context size for cache hits is 4096 tokens, updating the docs now to reflect this.
Hi Logan,
This is a classic example of a bait and switch. At launch the min cache limit was 2048. Now all of a sudden it’s 4096? The exact same thing happened with Gemini 2.5 Pro where at launch the context limit was advertised by you and the blog at 2048 and then, it was silently changed to 4096 .
What makes the matter worse is that Vertex AI offers both Gemini 2.5 Pro and 3 at 2048.
In a world where Anthropic offers very easy explicit context caching and OpenAI offers very accurate implicit/auto caching, google’s offering is by far the most complicated and least accurate to use.
We will likely be migrating to another provider soon, if the state of context caching does not improve.
Thank you for all your hard work