Have anyone checked out the implicit caching for gemini api, caches hits are inconsistent for me

M.S.Darshan_Kirthic · May 10, 2025, 9:29am

I’m experiencing inconsistent cache hits when sending requests with the same prefix. I would like to understand:

What is the TTL (Time To Live) for the Gemini API cache? Providing at least an approximate timeframe would be helpful for planning purposes.
In my production environment, I’m using Gemini API through the native client (without Vertex AI) to handle thousands of requests per second minimum. Even when sending requests with identical prefixes, I’m noticing cache misses. Is there a specific way to reference the cache or chat session to improve hit rates?
Currently, I’m including a timestamp in the system prompt for each API call. Will this timestamp cause cache misses since it changes with every request?

Kiran_Sai_Ramineni · May 12, 2025, 9:57am

The TTL of context depends upon the time you have defined to a ttl argument in the CreateCachedContentConfig. If not set, the TTL defaults to 1 hour.

If you are using explicit context caching make sure substantial initial context is referenced for shorter requests.

Could you please let us know, if you are using the same or different time stamp in the system prompt. Thank You.

M.S.Darshan_Kirthic · May 12, 2025, 12:28pm

hi @Kiran_Sai_Ramineni
I am talking about implicit caching not explicit caching

so we cannot define TTL of the cache that were natively created by google.
the time stamps will be different for each of the calls made because it will have seconds included in it.
I could observe that caches are missed after removing the timestamp from the system prompt.

-------- Log for token count: 1470 attempt: 1 times: 10------------------------

Is cached: False

INPUT TOKENS: 1482

CACHED TOKENS: None

Thought tokens: 100

Total tokens: 1582

-------- Log for token count: 5147 attempt: 2 times: 25------------------------

Is cached: False

INPUT TOKENS: 5159

CACHED TOKENS: None

Thought tokens: 54

Total tokens: 5214

-------- Log for token count: 12499 attempt: 3 times: 50------------------------

Is cached: True

INPUT TOKENS: 12511

CACHED TOKENS: 5136

Thought tokens: 100

Total tokens: 12611

-------- Log for token count: 27201 attempt: 4 times: 100------------------------

Is cached: True

INPUT TOKENS: 27213

CACHED TOKENS: 1028

Thought tokens: 100

Total tokens: 27313

-------- Log for token count: 56603 attempt: 5 times: 200------------------------

Is cached: True

INPUT TOKENS: 56615

CACHED TOKENS: 26776

Thought tokens: 100

Total tokens: 56715

-------- Log for token count: 93355 attempt: 6 times: 250------------------------

Is cached: False

INPUT TOKENS: 93367

CACHED TOKENS: None

Thought tokens: 100

Total tokens: 93467

-------- Log for token count: 144807 attempt: 7 times: 350------------------------

Is cached: True

INPUT TOKENS: 144819

CACHED TOKENS: 90678

Thought tokens: 95

Total tokens: 144915

-------- Log for token count: 218309 attempt: 8 times: 500------------------------

Is cached: False

INPUT TOKENS: 218321

CACHED TOKENS: None

Thought tokens: 73

Total tokens: 218395

-------- Log for token count: 306511 attempt: 9 times: 600------------------------

Is cached: True

INPUT TOKENS: 306523

CACHED TOKENS: 214381

Thought tokens: 100

Total tokens: 306623

-------- Log for token count: 424113 attempt: 10 times: 800------------------------

Is cached: False

INPUT TOKENS: 424125

CACHED TOKENS: None

Thought tokens: 54

Total tokens: 424180

-------- Log for token count: 571115 attempt: 11 times: 1000------------------------

Is cached: True

INPUT TOKENS: 571127

CACHED TOKENS: 420566

Thought tokens: 100

Total tokens: 571227

-------- Log for token count: 791617 attempt: 12 times: 1500------------------------

Is cached: False

INPUT TOKENS: 791629

CACHED TOKENS: None

Thought tokens: 100

Total tokens: 791729

Token limit exceeded

Token count: 1085619

Cache hits: 6

Cache misses: 6

END OF LOG

script used to generate the log

Max_Hedge · May 12, 2025, 3:08pm

I’m getting similar results => https://x.com/maxhedge/status/1921219724207112222

M.S.Darshan_Kirthic · May 13, 2025, 9:54am

The caching has improved a lot comparing the the log which i took on last saturday, not sure why the cache are missed in the subsequent request after a successful cache hit.

Kiran_Sai_Ramineni · May 16, 2025, 7:28am

Hi @M.S.Darshan_Kirthic, For implicit context caching we cannot define the TTL and also there is no guarantee TTL time for implicit context caching. If you want a guarantee you have to go for the explicit context caching. Thank You.

Yusuf · May 20, 2025, 3:16pm

Gemini latest pararamatres cant be treaked ? Or can it ?

Kiran_Sai_Ramineni · June 13, 2025, 9:49am

Hi @Yusuf, You can define parameters like temperature, top_k, top_p, max_token_length, stop_sequence in the generation config. Thank You.

Topic		Replies	Views
Implicit Caching: Gemini 2.5 Pro Preview 05-06 Gemini API context_caching , gemini_25_pro	2	202	May 12, 2025
Did My Vertex AI Input Caching Fail? Gemini API help-request , generative-ai	2	59	May 2, 2025
Any attempt to use cached context results in 500 Internal Server error Gemini API api , model	1	183	December 25, 2024
Gemini API so slow . Am i doing something wrong? Gemini API api , prompt	7	3600	November 21, 2024
Query: Gemini 2.0 Flash-Lite Explicit Caching Costs and Max TTL Limit Gemini API gemini-flash , context_caching	4	71	May 20, 2025

Have anyone checked out the implicit caching for gemini api, caches hits are inconsistent for me

Related topics