Why is the charge different from what I calculated?

Hello everyone,

I’m trying to understand my recent bill for the Gemini API and am facing a significant discrepancy between my expected cost and the actual amount charged. My primary use case is audio transcription, with the API being used from the Taiwan region.

Here is a summary of my token usage for the billing period:

  • Input Tokens: 6.17k

filter by SKUs:Generate content input token count gemini 2.5 flash input audio

  • Output: The generated text transcript was minimal, so output tokens should be negligible.

  • Model Used: gemini-2.5-flash-preview-05-20

Based on the official pricing sheet, my expected cost should be around $0.00616. However, my actual bill is $0.44.

My question is: According to the pricing model, how could the token count listed above result in such a high charge?

Am I misinterpreting the pricing, or are there other factors (like specific types of calls, image tokens, etc.) that I might not be accounting for?

Any insights would be greatly appreciated. Thank you!

@Wade

welcome to the community,

your calculation is correct if this inference is done only once i.e if you have given the input of 6.17K token in one api call and you go the output once . is this the case?

or if you were incrementally sending a smaller chunk of information repeatedly
i.e:
if you were to give a 10 sec audio(320 tok) and get a response of 100 tokens
and as a response to this if you were to give a voice query of another 10 sec (essentially having a chat) now your input is 320+100 (from previous call) + 320 (for the current query) = 740 toks for 2nd call.

hence as you use the api in one long chat, the token count accumulate for every turn.
at every turn the input will be cumulative sum of previous conversation + current query.

to reduce this , you can use caching