How to count tokens when using context caching

Hi all,

When using context caching, I noticed promptTokenCount always returns tokens from my prompt + cached content (e.g. a file content). But for billing estimative purpose, can I consider just countTokens() as the input tokens?

Tks.

Hi @Alexandre_Makiyama , for billing purpose it’s essential to consider ‘promptTokenCount’ because billing is calculated based on total number of token processed by the model. But cached tokens are billed at a lower rate when included in subsequent prompts.

1 Like

Hi @GUNAND_MAYANGLAMBAM, thanks for the reply.

I still didn’t get it. All subsequent prompts always return ‘promptTokenCount’ as the total number of tokens, including the file content. Isn’t the main purpose of context caching to avoid sending the file content (for example) repeatedly?

Edit: When I say “avoid sending the file content”, I mean to avoid counting the file content as tokens every prompt.

The pricing page might help: Harga Gemini API  |  Google AI for Developers
The tokens count is the same, caching or not. What changes is, for second, third, etc. request, when caching is in effect, the multiplier ($/token) is significantly reduced (for the cached tokens).

Hope that helps explain it.

2 Likes

Ok. Got it.

Thanks a lot.