Could someone help me understand gemini live pricing?

Hey, I checked both the vertex ai and gemini ai pricing.
They both mention that 1 second of audio is 32 tokens.
In my app I stream audio in and output text.

the live api pricing per million tokens is $2.10, at the rate described above 1 h should be 115.200 tokens ± $ 0.24 . At least that’s how I calculated (my output is really short @120-150 tokens in size so that’s not even worth calculating )

Basically I had about 1h 30 mins to 2h of live api consumption and it came out as $5.

One thing I’d love to understand is what’s with the modality text on the prompt details ?
how does one calculate the actual cost of the live api ?

{
  "serverContent": {
    "turnComplete": true
  },
  "usageMetadata": {
    "promptTokenCount": 541,
    "responseTokenCount": 156,
    "totalTokenCount": 697,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 410
      },
      {
        "modality": "AUDIO",
        "tokenCount": 131
      }
    ],
    "responseTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 156
      }
    ]
  }
}

@Octavian_Ratiu,

As you pointed out, 1 second audio is considered to be 32 tokens but this is per each inference /api call

i.e
Lets say you sent a 10 sec audio prompt to the api and you got 1000 tokens of text output
here your input tokens are 320 and output tokens are 1000.
next time when you send a query that adds to this conversation, lets say you sent another 10 sec of audio as query .
now your input will be cumulative of [320+1000]{old conversation}+ 320 (current 10 sec audio )= 1640

so you cost depends on how many iteration of chat and llm response.

hope that gives a better understanding on how to count tokens/price.

Thank you.