Real explicit cache limit

Rostyslav_Dembitskiy · March 19, 2026, 8:40pm

Documentation states that

Minimum cache token count for implicit and explicit caching

Gemini 3 and Gemini 3.1 models: 4,096 tokens
Gemini 2.0 and 2.5 models: 2,048 tokens

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview#limits

But in reality i was able to create for 3.1 model with 1024 tokens:

cache = await client.aio.caches.create(
    model="gemini-3.1-pro-preview",
    config=types.CreateCachedContentConfig(
        contents=[
            types.Content(
                role="user",
                parts=[types.Part(text=text)],
            )
        ],
        display_name="vertex-cached-prompt",
        ttl="120s",
    ),
)

Error i got while on attempt to create cache for 100 tokens:

google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {‘error’: {‘code’: 400, ‘message’: ‘The cached content is of 151 tokens. The minimum token count to start caching is 1024.’, ‘status’: ‘INVALID_ARGUMENT’}}

Can you clarify this please?

Topic		Replies	Views
Caching - can i cach less than the min mentioned? Gemini API	6	924	February 11, 2025
Has anyone gotten implicit caching to work? Gemini API gemini-3	2	158	May 5, 2026
Gemini 2.5 Flash Lite: Implicit Caching Not Working Despite Meeting Documented Requirements Gemini API bug , gemini	1	388	March 4, 2026
Implicit Context Caching does not work with Gemini 3 Pro Preview Gemini API context_caching	2	731	December 8, 2025
Did My Vertex AI Input Caching Fail? Gemini API help-request , generative-ai	2	237	May 2, 2025

Real explicit cache limit

Related topics