Have anyone checked out the implicit caching for gemini api, caches hits are inconsistent for me

I’m experiencing inconsistent cache hits when sending requests with the same prefix. I would like to understand:

  1. What is the TTL (Time To Live) for the Gemini API cache? Providing at least an approximate timeframe would be helpful for planning purposes.
  2. In my production environment, I’m using Gemini API through the native client (without Vertex AI) to handle thousands of requests per second minimum. Even when sending requests with identical prefixes, I’m noticing cache misses. Is there a specific way to reference the cache or chat session to improve hit rates?
  3. Currently, I’m including a timestamp in the system prompt for each API call. Will this timestamp cause cache misses since it changes with every request?

@Kiran_Sai_Ramineni

Hi @M.S.Darshan_Kirthic,

The TTL of context depends upon the time you have defined to a ttl argument in the CreateCachedContentConfig. If not set, the TTL defaults to 1 hour.

If you are using explicit context caching make sure substantial initial context is referenced for shorter requests.

Could you please let us know, if you are using the same or different time stamp in the system prompt. Thank You.

hi @Kiran_Sai_Ramineni
I am talking about implicit caching not explicit caching

  • so we cannot define TTL of the cache that were natively created by google.
  • the time stamps will be different for each of the calls made because it will have seconds included in it.
  • I could observe that caches are missed after removing the timestamp from the system prompt.

-------- Log for token count: 1470 attempt: 1 times: 10------------------------

Is cached: False

INPUT TOKENS: 1482

CACHED TOKENS: None

Thought tokens: 100

Total tokens: 1582

-------- Log for token count: 5147 attempt: 2 times: 25------------------------

Is cached: False

INPUT TOKENS: 5159

CACHED TOKENS: None

Thought tokens: 54

Total tokens: 5214

-------- Log for token count: 12499 attempt: 3 times: 50------------------------

Is cached: True

INPUT TOKENS: 12511

CACHED TOKENS: 5136

Thought tokens: 100

Total tokens: 12611

-------- Log for token count: 27201 attempt: 4 times: 100------------------------

Is cached: True

INPUT TOKENS: 27213

CACHED TOKENS: 1028

Thought tokens: 100

Total tokens: 27313

-------- Log for token count: 56603 attempt: 5 times: 200------------------------

Is cached: True

INPUT TOKENS: 56615

CACHED TOKENS: 26776

Thought tokens: 100

Total tokens: 56715

-------- Log for token count: 93355 attempt: 6 times: 250------------------------

Is cached: False

INPUT TOKENS: 93367

CACHED TOKENS: None

Thought tokens: 100

Total tokens: 93467

-------- Log for token count: 144807 attempt: 7 times: 350------------------------

Is cached: True

INPUT TOKENS: 144819

CACHED TOKENS: 90678

Thought tokens: 95

Total tokens: 144915

-------- Log for token count: 218309 attempt: 8 times: 500------------------------

Is cached: False

INPUT TOKENS: 218321

CACHED TOKENS: None

Thought tokens: 73

Total tokens: 218395

-------- Log for token count: 306511 attempt: 9 times: 600------------------------

Is cached: True

INPUT TOKENS: 306523

CACHED TOKENS: 214381

Thought tokens: 100

Total tokens: 306623

-------- Log for token count: 424113 attempt: 10 times: 800------------------------

Is cached: False

INPUT TOKENS: 424125

CACHED TOKENS: None

Thought tokens: 54

Total tokens: 424180

-------- Log for token count: 571115 attempt: 11 times: 1000------------------------

Is cached: True

INPUT TOKENS: 571127

CACHED TOKENS: 420566

Thought tokens: 100

Total tokens: 571227

-------- Log for token count: 791617 attempt: 12 times: 1500------------------------

Is cached: False

INPUT TOKENS: 791629

CACHED TOKENS: None

Thought tokens: 100

Total tokens: 791729

Token limit exceeded

Token count: 1085619

Cache hits: 6

Cache misses: 6

END OF LOG

script used to generate the log

I’m getting similar results => https://x.com/maxhedge/status/1921219724207112222

The caching has improved a lot comparing the the log which i took on last saturday, not sure why the cache are missed in the subsequent request after a successful cache hit.