Hello I’m experiencing an issue where implicit caching is not being applied when using gemini-2.5-pro via API. According to the documentation, implicit caching should activate when input exceeds 2,048 tokens, but I’m not seeing any caching even with prompt token counts exceeding 30k tokens.
To verify whether caching is working, I’ve run multiple tests using identical prompts, data, and structures, but prompt caching consistently fails to activate. I’m only seeing the following repeated results without any cache hits:
cache_tokens_details=None
cached_content_token_count=None
candidates_token_count=12841
candidates_tokens_details=None
prompt_token_count=30151
prompt_tokens_details=[
ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=2545),
ModalityTokenCount(modality=<MediaModality.IMAGE: 'IMAGE'>, token_count=27606)
]
thoughts_token_count=9003
tool_use_prompt_token_count=None
tool_use_prompt_tokens_details=None
total_token_count=51995
traffic_type=None
I have several questions regarding this issue:
-
Is this a known ongoing issue? There have been previous reports in this forum about implicit caching problems with Gemini 2.5 Pro. Has this issue been resolved, or is it still an active problem?
-
Modality requirements clarification: My input tokens include a significant number of Base64-encoded image files. When the documentation states that caching activates above 2,048 tokens, does this refer specifically to TEXT modality tokens, or does it include all token types? In my case, I have 2,545 text tokens and 27,606 image tokens.
-
Model-specific behavior: For reference, I’ve confirmed that gemini-2.5-flash does work with caching under the same settings and configuration.
This issue is causing significant cost implications for our production environment, as we’re not benefiting from the expected caching behavior with large prompts. Any clarification on whether this is a known issue or guidance on proper implementation would be greatly appreciated.