Could someone help me understand gemini live pricing?

Hey, I checked both the vertex ai and gemini ai pricing.
They both mention that 1 second of audio is 32 tokens.
In my app I stream audio in and output text.

the live api pricing per million tokens is $2.10, at the rate described above 1 h should be 115.200 tokens ± $ 0.24 . At least that’s how I calculated (my output is really short @120-150 tokens in size so that’s not even worth calculating )

Basically I had about 1h 30 mins to 2h of live api consumption and it came out as $5.

One thing I’d love to understand is what’s with the modality text on the prompt details ?
how does one calculate the actual cost of the live api ?

{
  "serverContent": {
    "turnComplete": true
  },
  "usageMetadata": {
    "promptTokenCount": 541,
    "responseTokenCount": 156,
    "totalTokenCount": 697,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 410
      },
      {
        "modality": "AUDIO",
        "tokenCount": 131
      }
    ],
    "responseTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 156
      }
    ]
  }
}