Hi all,
When using context caching, I noticed promptTokenCount always returns tokens from my prompt + cached content (e.g. a file content). But for billing estimative purpose, can I consider just countTokens() as the input tokens?
Tks.
Hi all,
When using context caching, I noticed promptTokenCount always returns tokens from my prompt + cached content (e.g. a file content). But for billing estimative purpose, can I consider just countTokens() as the input tokens?
Tks.
Hi @Alexandre_Makiyama , for billing purpose itâs essential to consider âpromptTokenCountâ because billing is calculated based on total number of token processed by the model. But cached tokens are billed at a lower rate when included in subsequent prompts.
Hi @GUNAND_MAYANGLAMBAM, thanks for the reply.
I still didnât get it. All subsequent prompts always return âpromptTokenCountâ as the total number of tokens, including the file content. Isnât the main purpose of context caching to avoid sending the file content (for example) repeatedly?
Edit: When I say âavoid sending the file contentâ, I mean to avoid counting the file content as tokens every prompt.
The pricing page might help: Harga Gemini API | Google AI for Developers
The tokens count is the same, caching or not. What changes is, for second, third, etc. request, when caching is in effect, the multiplier ($/token) is significantly reduced (for the cached tokens).
Hope that helps explain it.
Ok. Got it.
Thanks a lot.