Question about Gemini API caching pricing

kota · October 12, 2025, 10:07am

Hello,

I have a question about the caching pricing in the Gemini API.
I looked through the community posts but couldn’t find a clear answer, so I’d really appreciate your help.

Currently, the Gemini 2.5 Pro cache pricing is listed as follows:

$0.125, prompts ≤ 200k tokens
$0.25, prompts > 200k tokens
$4.50 / 1,000,000 tokens per hour (storage price)

I understand the storage price, but I’m not sure about the prompts pricing.
Does this fee apply when the cache is created, or each time a cached prompt is used in a request?

For example, suppose I have a cached prompt of 10,000 tokens that I use in three requests:

If the charge applies when the cache is created:
$0.125 / 1,000,000 × 10,000 = $0.00125
If the charge applies for every request using the cached prompt:
$0.125 / 1,000,000 × 10,000 × 3 = $0.00375

Which of the above is correct?

Also, the usage data is returned in the API response like this:

{
  "promptTokenCount": 11500,
  "candidatesTokenCount": 1000,
  "totalTokenCount": 22500,
  "cachedContentTokenCount": 10000,
  "thoughtsTokenCount": 10000
}

And the Gemini 2.5 Pro I/O prices (for ≤ 200k tokens) are:

Input: $1.25 / 1M tokens
Output: $2.50 / 1M tokens

If the cache charge is applied per request (as in case 2 above), would the following calculation for a single request be correct?

Input: (promptTokenCount − cachedContentTokenCount) / 1,000,000 × $1.25
= (11,500 − 10,000) / 1,000,000 × $1.25 = $0.001875
Cached input: cachedContentTokenCount / 1,000,000 × $0.125
= 10,000 / 1,000,000 × $0.125 = $0.00125
Output: (thoughtsTokenCount + candidatesTokenCount) / 1,000,000 × $2.50
= (10,000 + 1,000) / 1,000,000 × $2.50 = $0.0275

Does this interpretation look correct?
Thank you in advance for your help!

Pooja_Kapse · November 6, 2025, 5:50am

Hello, thank you for the detailed and well-formulated question.
To answer your main question: The charge of $0.125 / 1M tokens applies each time you use the cached prompt in a request. It is not a one-time fee but the discounted input rate for the cachedContentTokenCount.
Cache price not as an extra fee, but as a big discount for reusing your prompt

Your second scenario, where the charge applies for every request, is the correct interpretation.

Cached Input: The cachedContentTokenCount is billed at the discounted rate of $0.125 / 1M tokens.
Standard Input: The new tokens (promptTokenCount - cachedContentTokenCount) are billed at the standard input rate of $1.25 / 1M tokens.
Output: The total output cost is based on the sum of candidatesTokenCount and thoughtsTokenCount, billed at the standard output rate.

Therefore, your interpretation is correct. The total cost for a single API call is the sum of those three parts, while the separate $4.50 / 1M tokens/hour fee is charged for storing the cache between calls.

Topic		Replies	Views
Gemini 2.5 Pro Context Cache Pricing: Per-request vs Cumulative Token Counting? Gemini API api , api-key , gemini-25	1	209	July 8, 2025
This is a question about explicit caching Gemini API help_request , context_caching	4	108	January 29, 2026
Pricing mechanism Gemini API	4	204	October 10, 2024
Why is the charge different from what I calculated? Gemini API api , gemini-flash	1	141	June 25, 2025
How is a cache hit for prompts calculated? Gemini API prompt	1	53	October 27, 2025

Question about Gemini API caching pricing

Related topics