Gemini 2.5 Flash Live Implicit Context Caching Not Working / Feedback

Quinn_Damerell · October 23, 2025, 4:00pm

I have been using the Google Gemini 2.5 Flash Live model to power my Home Assistant Sage AI feature for Homeway. The model is great. I have been pleased with the latency and how well it does at function calling, in contrast to OpenAI’s real-time. 2.5 Flash is so much better at breaking down the user’s prompt and making function calls with correct parameter payloads.

However, token caching really helps me with the cost of OpenAI. I read Gemini API article about how token caching works with Google Gemini 2.5 Flash, and it seems like the implicit caching aligns with how OpenAI’s APIs work, where it’s all automatic. In my OpenAI dashboard I can see the break down of cached vs uncached tokens, and it’s about 20-30%.

However, for the Flash 2.5 Live model, I hooked up telemetry to report the token counts sent in the update meta message. I see the tokens being used, but I never see any tokens being cached.

All my live sessions are text in and out only and start with a static system prompt thats over the 1024 min token limit for caching. It seems like the model / API should be able to figure out quite easily that the first big block of text is constant and is cache-able, but it never caches it.

So I have two questions:

Is there any more guidance beyond what’s on the Context Caching page that would help me make sure my sessions are as cachable as possible?
It would be great if more context were given back to the user that gives insight into the implicit caching logic, so the developers could tune their sessions to be better cached. For example, it would be great to have some insight into why the API / model isn’t considering my static system prompt as cachable.

Thanks!

Pannaga_J · November 26, 2025, 11:20am

Hi @Quinn_Damerell
It’s great to hear you’re getting good performance from the Gemini 2.5 Flash Live model.
To address the missing cache tokens in your logs, consider implementing Explicit Caching. I believe this is a superior solution because it guarantees a cache hit for your Home Assistant context, directly resulting in the cost savings you’re aiming for.
Thanks

Quinn_Damerell · November 28, 2025, 9:47pm

Thanks for the response! I would use explicit token caching but I don’t think it works for the realtime api correct?

Pooja_Kapse · December 22, 2025, 9:14am

Hi @Quinn_Damerell,
Caching is currently not supported on the Gemini 2.5 Flash Live API, as detailed in the official Model documentation.

Quinn_Damerell · December 22, 2025, 4:25pm

Ah, I missed that. I didn’t see the caching section of the model page, I was just going by the real-time API docs. I didn’t expect caching to be a model-level feature.

Thanks for the help!

Topic		Replies	Views
Gemini 2.5 Flash implicit caching problem Gemini API api , context_caching	5	742	March 4, 2026
Gemini 2.5 Flash Lite: Implicit Caching Not Working Despite Meeting Documented Requirements Gemini API bug , gemini	1	387	March 4, 2026
Flash implicit caching only works after 6k tokens vs the advertised 1k tokens Gemini API api , gemini-flash	1	235	July 2, 2025
Has anyone gotten implicit caching to work? Gemini API gemini-3	2	152	May 5, 2026
Implicit Caching not Working on Gemini 2.5 Pro Gemini API gemini-2-5 , context_caching	3	670	June 16, 2025

Gemini 2.5 Flash Live Implicit Context Caching Not Working / Feedback

Related topics