Hello,
I have a question about the caching pricing in the Gemini API.
I looked through the community posts but couldn’t find a clear answer, so I’d really appreciate your help.
Currently, the Gemini 2.5 Pro cache pricing is listed as follows:
-
$0.125, prompts ≤ 200k tokens
-
$0.25, prompts > 200k tokens
-
$4.50 / 1,000,000 tokens per hour (storage price)
I understand the storage price, but I’m not sure about the prompts pricing.
Does this fee apply when the cache is created, or each time a cached prompt is used in a request?
For example, suppose I have a cached prompt of 10,000 tokens that I use in three requests:
-
If the charge applies when the cache is created:
$0.125 / 1,000,000 × 10,000 = $0.00125 -
If the charge applies for every request using the cached prompt:
$0.125 / 1,000,000 × 10,000 × 3 = $0.00375
Which of the above is correct?
Also, the usage data is returned in the API response like this:
{
"promptTokenCount": 11500,
"candidatesTokenCount": 1000,
"totalTokenCount": 22500,
"cachedContentTokenCount": 10000,
"thoughtsTokenCount": 10000
}
And the Gemini 2.5 Pro I/O prices (for ≤ 200k tokens) are:
-
Input: $1.25 / 1M tokens
-
Output: $2.50 / 1M tokens
If the cache charge is applied per request (as in case 2 above), would the following calculation for a single request be correct?
-
Input: (
promptTokenCount−cachedContentTokenCount) / 1,000,000 × $1.25
= (11,500 − 10,000) / 1,000,000 × $1.25 = $0.001875 -
Cached input:
cachedContentTokenCount/ 1,000,000 × $0.125
= 10,000 / 1,000,000 × $0.125 = $0.00125 -
Output: (
thoughtsTokenCount+candidatesTokenCount) / 1,000,000 × $2.50
= (10,000 + 1,000) / 1,000,000 × $2.50 = $0.0275
Does this interpretation look correct?
Thank you in advance for your help!