Hi,
I need to confirm how Vertex AI / Gemini bills caches.create for explicit context caching.
The documentation says:
For both implicit and explicit caching, you’re billed for the input tokens used to create the cache at the standard input token price. For explicit caching, there are also storage costs based on how long caches are stored.
Reference:
Does “also storage costs” mean explicit cache creation is charged as both standard input tokens and TTL storage, or is only storage charged at creation time?
Example:
- Model:
gemini-2.5-pro - Cached tokens:
100,000 - TTL:
5 minutes - Standard input:
$1.25 / 1M tokens - Cached input:
$0.13 / 1M tokens - Storage:
$4.50 / 1M token-hour
Interpretation A
caches.create is billed as:
- standard input token cost for the cached tokens
- plus storage cost for the TTL
So for 100,000 tokens and 5 minutes:
create input = 0.1 * $1.25 = $0.125
storage = 0.1 * $4.50 * (5 / 60) = $0.0375
total cache creation/storage cost = $0.1625 before any cachedContent read
Interpretation B
caches.create is billed only as storage:
storage = 0.1 * $4.50 * (5 / 60) = $0.0375
No separate standard input token charge is applied at cache creation time.
My question is only about the cache creation step. I understand that later generateContent requests using cachedContent are billed at the cached input price for cachedContentTokenCount.
Does caches.create charge both standard input tokens and explicit cache storage, or storage only?
If Interpretation A is correct:
- Is
CachedContent.usageMetadata.totalTokenCountthe token count used for both cache creation input billing and storage token-hour billing? - What SKU names should I expect in Cloud Billing export or Cost Table for the standard input charge and the explicit cache storage charge?
Thanks.