Does explicit context cache creation support Flex PayGo pricing?

Nithin_Saji · May 31, 2026, 5:37pm

Hi Google AI team,

I am using Gemini on Vertex AI with Flex PayGo via the global endpoint and these request headers:

X-Vertex-AI-LLM-Request-Type: shared

X-Vertex-AI-LLM-Shared-Request-Type: flex

I tested creating an explicit context cache with audio content and then using it in generate_content with the same Flex client. The API accepts it, and the generate response reports:

traffic_type: ON_DEMAND_FLEX

cached_content_token_count:

My question is about billing:

When using explicit context caching with a Flex PayGo client/header, is the cache creation billed at Flex PayGo pricing or standard PayGo input pricing?

Related question: when a cached content resource is referenced from a Flex PayGo request, do cached-token read discounts combine with Flex pricing, or is explicit-cache pricing applied independently?

Use case: audio pipeline where every clip is first classified, and around 50% of clips are then transcribed. We want to know whether explicit caching can preserve Flex’s 50% audio input discount, or whether we should rely only on implicit caching.

Thanks.

Topic		Replies	Views
Gemini explicit context cache creation billing Gemini API billing	1	100	July 1, 2026
This is a question about explicit caching Gemini API help_request , context_caching	4	239	January 29, 2026
Clarification on Context Cache Storage Billing (TTL vs. Actual Time) Gemini API billing	1	382	May 12, 2025
How are “short input”, “long input”, and “cached input” token costs calculated for Gemini 2.5 Flash? Gemini API api , gemini-flash-2-5	1	255	December 31, 2025
How to count tokens when using context caching Gemini API	4	416	August 27, 2024

Does explicit context cache creation support Flex PayGo pricing?

Related topics