Question about Gemini API caching pricing

Hello,

I have a question about the caching pricing in the Gemini API.
I looked through the community posts but couldn’t find a clear answer, so I’d really appreciate your help.

Currently, the Gemini 2.5 Pro cache pricing is listed as follows:

  • $0.125, prompts ≤ 200k tokens

  • $0.25, prompts > 200k tokens

  • $4.50 / 1,000,000 tokens per hour (storage price)

I understand the storage price, but I’m not sure about the prompts pricing.
Does this fee apply when the cache is created, or each time a cached prompt is used in a request?

For example, suppose I have a cached prompt of 10,000 tokens that I use in three requests:

  1. If the charge applies when the cache is created:
    $0.125 / 1,000,000 × 10,000 = $0.00125

  2. If the charge applies for every request using the cached prompt:
    $0.125 / 1,000,000 × 10,000 × 3 = $0.00375

Which of the above is correct?


Also, the usage data is returned in the API response like this:

{
  "promptTokenCount": 11500,
  "candidatesTokenCount": 1000,
  "totalTokenCount": 22500,
  "cachedContentTokenCount": 10000,
  "thoughtsTokenCount": 10000
}

And the Gemini 2.5 Pro I/O prices (for ≤ 200k tokens) are:

  • Input: $1.25 / 1M tokens

  • Output: $2.50 / 1M tokens

If the cache charge is applied per request (as in case 2 above), would the following calculation for a single request be correct?

  • Input: (promptTokenCountcachedContentTokenCount) / 1,000,000 × $1.25
    = (11,500 − 10,000) / 1,000,000 × $1.25 = $0.001875

  • Cached input: cachedContentTokenCount / 1,000,000 × $0.125
    = 10,000 / 1,000,000 × $0.125 = $0.00125

  • Output: (thoughtsTokenCount + candidatesTokenCount) / 1,000,000 × $2.50
    = (10,000 + 1,000) / 1,000,000 × $2.50 = $0.0275


Does this interpretation look correct?
Thank you in advance for your help!

Hello, thank you for the detailed and well-formulated question.
To answer your main question: The charge of $0.125 / 1M tokens applies each time you use the cached prompt in a request. It is not a one-time fee but the discounted input rate for the cachedContentTokenCount.
Cache price not as an extra fee, but as a big discount for reusing your prompt

Your second scenario, where the charge applies for every request, is the correct interpretation.

  1. Cached Input: The cachedContentTokenCount is billed at the discounted rate of $0.125 / 1M tokens.
  2. Standard Input: The new tokens (promptTokenCount - cachedContentTokenCount) are billed at the standard input rate of $1.25 / 1M tokens.
  3. Output: The total output cost is based on the sum of candidatesTokenCount and thoughtsTokenCount, billed at the standard output rate.

Therefore, your interpretation is correct. The total cost for a single API call is the sum of those three parts, while the separate $4.50 / 1M tokens/hour fee is charged for storing the cache between calls.

1 Like