Gemini Live Caching

Is audio context billed at the same $2.10 per million tokens? Is audio context caching planned? Thanks

@IKGN, audio is processed at 32 tokens/sec and the charge will be based on token cost of the model you are using.

as for caching, you can cache the audio please check the reference here Context caching  |  Gemini API  |  Google AI for Developers

Thanks, I’m asking about Gemini Live. I’m speaking about internal audio context of this model. The complete cumulative context seems to be billed on each conversation turn, not only the actual new audio I send during the conversation. I don’t think I can currently cache this audio?

Hey @IKGN - for billing purposes in the Live API, all tokens are counted at every turn, including new tokens from the latest prompts and tokens from the previous context.

I’ll file this (the ability to cache tokens and only be billed for new tokens) with the team as a FR.

1 Like

Thank you very much! It makes a huge price difference, the quadratically increasing audio context price dominates all other costs. My app needs long conversations. If the context could be cached, it would help me a lot!

This would be super helpful. Without this, live audio is basically unusable for a lot of cases. Just a single minute of live audio can consume up to 20k tokens or even more depending on the prompt etc. What Google pricing hints is that 1 min should cost 1920 tokens but in reality the token consumption explodes as it keeps processing all previous audio + instructions + tools (I think) every single turn.

1 Like