Gemini 2.5 Flash Lite: Implicit Caching Not Working Despite Meeting Documented Requirements

ding_fow · October 12, 2025, 5:15am

Hello Gemini team,

I’ve been testing implicit caching with Gemini 2.5 Flash Lite Latest and encountered unexpected behavior that
differs from the documentation. I’d appreciate clarification on whether Flash Lite has different caching
mechanisms compared to standard Flash.

Issue Summary

Despite meeting the documented 1024-token threshold for implicit caching, Flash Lite consistently returns
cached_content_token_count = 0 in production environments, while my controlled tests show caching only works above
~3000 tokens.

Test Evidence

Production Environment (No Cache Hits)

Model: gemini-flash-lite-latest
System prompt: ~1301 tokens (fixed across requests)
Total input: ~1551 tokens per request
Results: 377 API calls, 0 cache hits (cached_content_token_count always 0)
Configuration: Used system_instruction parameter with identical content

Controlled Test (Partial Cache Hits)

I wrote a test script to systematically verify caching behavior:

Test Scenario 1 (~3177 tokens):
Request 1: cached_content_token_count = 0
Request 2: cached_content_token_count = 0
Request 3: cached_content_token_count = 3060

Test Scenario 2 (~1500 tokens):
All requests: cached_content_token_count = 0

Documentation Gaps

Official documentation states implicit caching triggers at 1024 tokens for Flash models
Reality: Our tests suggest Flash Lite requires significantly more tokens (~3000-6000+)
Unclear differentiation: The docs mention “Gemini 2.5 series” but don’t explicitly clarify if Flash Lite has
different caching thresholds or mechanisms

Related Community Reports

I found a community bug report ( Gemini 2.5 Flash implicit caching problem )
where Google engineers confirmed the actual threshold is much higher (~6000-8000 tokens) for standard Flash,
despite documentation stating 1024 tokens.

Questions for Google Team

Does Flash Lite have a different implicit caching threshold than standard Flash?
What is the actual token threshold for Flash Lite to trigger implicit caching?
Is the system_instruction parameter content included in the caching prefix matching?
When will the documentation be updated to reflect actual caching behavior?

Requests

Official clarification on Gemini 2.5 Flash Lite’s caching trigger conditions
Documentation update to explicitly state Flash Lite requirements vs standard Flash
Bug fix or clear guidance on achieving consistent caching behavior at lower token counts (as originally
documented)

Technical Details

API Configuration:
response = client.models.generate_content(
model=“models/gemini-flash-lite-latest”,
contents=[…],
config=types.GenerateContentConfig(
system_instruction=fixed_prompt, # ~1301 tokens
temperature=0.8,
top_p=0.95
)
)

Cache Detection:
cached_tokens = response.usage_metadata.cached_content_token_count

Environment:

SDK: google-genai (latest)
Deployment: Production service with high request volume
Use case: Translation service with consistent system prompts

Impact

This discrepancy significantly affects cost estimation and system design for production applications.
Understanding the exact caching behavior is crucial for our service planning.

Thank you for your attention to this issue. Happy to provide additional test data or collaborate on reproducing
the behavior.

JuanIdrobo · March 4, 2026, 10:36pm

keeps happening, I tried with 2k token prompt and nothing gets cached

Topic		Replies	Views
Flash implicit caching only works after 6k tokens vs the advertised 1k tokens Gemini API api , gemini-flash	1	204	July 2, 2025
Gemini 2.5 Flash implicit caching problem Gemini API api , context_caching	5	676	March 4, 2026
Has anyone gotten implicit caching to work? Gemini API gemini-3	2	45	May 5, 2026
Gemini 2.5 Flash Live Implicit Context Caching Not Working / Feedback Gemini API models , gemini	4	282	December 22, 2025
Implicit Caching not Working on Gemini 2.5 Pro Gemini API gemini-2-5 , context_caching	3	625	June 16, 2025

Gemini 2.5 Flash Lite: Implicit Caching Not Working Despite Meeting Documented Requirements

Related topics