In the case of the Vertex API, the cost for creating explicit caching is clearly listed, so that part is understandable.
But for AIS or the general API, there’s no mention of a cache creation cost, and when I look at the billed amount, it seems to be calculated as if the cache creation cost was not included at all.
Is this a pricing structure where only Vertex charges for cache creation?
Hi @user4228 ,
Could you please clarify which API are you referring to by AIS or the general API and which specific billing report are you looking at?
Simply put, I’m wondering about the generation cost when using explicit caching with the AIS or Gemini APIs.
I saw a forum post stating that “after actual testing, there is no cache creation cost shown in the Google billing report or in the actual charged amount.”
However, when I look through other forum discussions, many people seem to treat cached usage as being charged under context caching pricing. For example, in the case of Gemini 3.0 Pro, the pricing page
(https://ai.google.dev/gemini-api/docs/pricing?hl=en#standard) lists prices such as $0.20, and $0.40 for prompts over 200,000 tokens. Because of this, I’m not sure which interpretation is correct.
For Vertex AI, the documentation clearly states that the explicit cache creation cost is the same as the normal input cost. However, I can’t find any similar explanation in the documentation for the standard Gemini API or AIS, no matter how much I search—so I wanted to ask for clarification.
Hi @user4228 ,
Based on the Gemini API documentation, Explicit Caching follows the following billing structure:
-
Creation: You are billed for the input token size used to create the cache and how long you want the tokens to persist.
-
Storage: The amount of time cached tokens are stored (TTL), billed based on the TTL duration of cached token count.
-
Cache token count: The number of input tokens cached, billed at a reduced rate when included in subsequent prompts.