Gemini 2.5 Flash Live Implicit Context Caching Not Working / Feedback

I have been using the Google Gemini 2.5 Flash Live model to power my Home Assistant Sage AI feature for Homeway. The model is great. I have been pleased with the latency and how well it does at function calling, in contrast to OpenAI’s real-time. 2.5 Flash is so much better at breaking down the user’s prompt and making function calls with correct parameter payloads.

However, token caching really helps me with the cost of OpenAI. I read Gemini API article about how token caching works with Google Gemini 2.5 Flash, and it seems like the implicit caching aligns with how OpenAI’s APIs work, where it’s all automatic. In my OpenAI dashboard I can see the break down of cached vs uncached tokens, and it’s about 20-30%.

However, for the Flash 2.5 Live model, I hooked up telemetry to report the token counts sent in the update meta message. I see the tokens being used, but I never see any tokens being cached.

All my live sessions are text in and out only and start with a static system prompt thats over the 1024 min token limit for caching. It seems like the model / API should be able to figure out quite easily that the first big block of text is constant and is cache-able, but it never caches it.

So I have two questions:

  1. Is there any more guidance beyond what’s on the Context Caching page that would help me make sure my sessions are as cachable as possible?

  2. It would be great if more context were given back to the user that gives insight into the implicit caching logic, so the developers could tune their sessions to be better cached. For example, it would be great to have some insight into why the API / model isn’t considering my static system prompt as cachable.

Thanks!