Live API pricing

Hello,
these prices are listed, starting today:
Input: $0.35 (text), $2.10 (audio / image [video])
Output: $1.50 (text), $8.50 (audio)
But how are these calculated? Is audio context invoiced separately or is it free with Live? There are no prices given for maintaining the context during conversation.
If context is not free: will there be caching? What would it cost?
It wouldn’t be that important, but the prices are 3-4 times higher than standard Flash 2 and it very much does impact the project I was building with Gemini.
Can you give a rough estimate how much a 15 minutes conversation is going to cost in AUDIO or TEXT modalities?
Thank you