This is a question about explicit caching

user4228 · January 5, 2026, 4:07am

In the case of the Vertex API, the cost for creating explicit caching is clearly listed, so that part is understandable.
But for AIS or the general API, there’s no mention of a cache creation cost, and when I look at the billed amount, it seems to be calculated as if the cache creation cost was not included at all.
Is this a pricing structure where only Vertex charges for cache creation?

Sonali_Kumari1 · January 5, 2026, 7:36am

Hi @user4228 ,

Could you please clarify which API are you referring to by AIS or the general API and which specific billing report are you looking at?

user4228 · January 5, 2026, 8:03am

Simply put, I’m wondering about the generation cost when using explicit caching with the AIS or Gemini APIs.

I saw a forum post stating that “after actual testing, there is no cache creation cost shown in the Google billing report or in the actual charged amount.”

However, when I look through other forum discussions, many people seem to treat cached usage as being charged under context caching pricing. For example, in the case of Gemini 3.0 Pro, the pricing page
(https://ai.google.dev/gemini-api/docs/pricing?hl=en#standard) lists prices such as $0.20, and $0.40 for prompts over 200,000 tokens. Because of this, I’m not sure which interpretation is correct.

For Vertex AI, the documentation clearly states that the explicit cache creation cost is the same as the normal input cost. However, I can’t find any similar explanation in the documentation for the standard Gemini API or AIS, no matter how much I search—so I wanted to ask for clarification.

Sonali_Kumari1 · January 6, 2026, 7:28am

Hi @user4228 ,

Based on the Gemini API documentation, Explicit Caching follows the following billing structure:

Creation: You are billed for the input token size used to create the cache and how long you want the tokens to persist.
Storage: The amount of time cached tokens are stored (TTL), billed based on the TTL duration of cached token count.
Cache token count: The number of input tokens cached, billed at a reduced rate when included in subsequent prompts.

FirefoxMetzger · January 29, 2026, 4:45pm

@Sonali_Kumari1 Where did you find that cache creation is billed at token rate?

I am on the same page on storage and cache usage pricing … though I’m not fully sure if the storage billing is based on the duration you choose initially or goes per interval, i.e., if I create a 24h cache and delete it after 1h am I billed 24h or 1h?

Topic		Replies	Views
Does explicit context cache creation support Flex PayGo pricing? Gemini API billing	0	18	May 31, 2026
Clarification on Context Cache Storage Billing (TTL vs. Actual Time) Gemini API billing	1	306	May 12, 2025
Gemini 2.5 Pro Context Cache Pricing: Per-request vs Cumulative Token Counting? Gemini API api , api-key , gemini-25	1	296	July 8, 2025
How are “short input”, “long input”, and “cached input” token costs calculated for Gemini 2.5 Flash? Gemini API api , gemini-flash-2-5	1	178	December 31, 2025
Question about Gemini API caching pricing Gemini API api , billing	1	363	November 6, 2025

This is a question about explicit caching

Related topics