I’m experiencing inconsistent cache hits when sending requests with the same prefix. I would like to understand:
What is the TTL (Time To Live) for the Gemini API cache? Providing at least an approximate timeframe would be helpful for planning purposes.
In my production environment, I’m using Gemini API through the native client (without Vertex AI) to handle thousands of requests per second minimum. Even when sending requests with identical prefixes, I’m noticing cache misses. Is there a specific way to reference the cache or chat session to improve hit rates?
Currently, I’m including a timestamp in the system prompt for each API call. Will this timestamp cause cache misses since it changes with every request?
@Kiran_Sai_Ramineni
Hi @M.S.Darshan_Kirthic ,
The TTL of context depends upon the time you have defined to a ttl argument in the CreateCachedContentConfig
. If not set, the TTL defaults to 1 hour.
M.S.Darshan_Kirthic:
In my production environment, I’m using Gemini API through the native client (without Vertex AI) to handle thousands of requests per second minimum. Even when sending requests with identical prefixes, I’m noticing cache misses. Is there a specific way to reference the cache or chat session to improve hit rates?
If you are using explicit context caching make sure substantial initial context is referenced for shorter requests.
Could you please let us know, if you are using the same or different time stamp in the system prompt. Thank You.
hi @Kiran_Sai_Ramineni
I am talking about implicit caching not explicit caching
so we cannot define TTL of the cache that were natively created by google.
the time stamps will be different for each of the calls made because it will have seconds included in it.
I could observe that caches are missed after removing the timestamp from the system prompt.
-------- Log for token count: 1470 attempt: 1 times: 10------------------------
Is cached: False
INPUT TOKENS: 1482
CACHED TOKENS: None
Thought tokens: 100
Total tokens: 1582
-------- Log for token count: 5147 attempt: 2 times: 25------------------------
Is cached: False
INPUT TOKENS: 5159
CACHED TOKENS: None
Thought tokens: 54
Total tokens: 5214
-------- Log for token count: 12499 attempt: 3 times: 50------------------------
Is cached: True
INPUT TOKENS: 12511
CACHED TOKENS: 5136
Thought tokens: 100
Total tokens: 12611
-------- Log for token count: 27201 attempt: 4 times: 100------------------------
Is cached: True
INPUT TOKENS: 27213
CACHED TOKENS: 1028
Thought tokens: 100
Total tokens: 27313
-------- Log for token count: 56603 attempt: 5 times: 200------------------------
Is cached: True
INPUT TOKENS: 56615
CACHED TOKENS: 26776
Thought tokens: 100
Total tokens: 56715
-------- Log for token count: 93355 attempt: 6 times: 250------------------------
Is cached: False
INPUT TOKENS: 93367
CACHED TOKENS: None
Thought tokens: 100
Total tokens: 93467
-------- Log for token count: 144807 attempt: 7 times: 350------------------------
Is cached: True
INPUT TOKENS: 144819
CACHED TOKENS: 90678
Thought tokens: 95
Total tokens: 144915
-------- Log for token count: 218309 attempt: 8 times: 500------------------------
Is cached: False
INPUT TOKENS: 218321
CACHED TOKENS: None
Thought tokens: 73
Total tokens: 218395
-------- Log for token count: 306511 attempt: 9 times: 600------------------------
Is cached: True
INPUT TOKENS: 306523
CACHED TOKENS: 214381
Thought tokens: 100
Total tokens: 306623
-------- Log for token count: 424113 attempt: 10 times: 800------------------------
Is cached: False
INPUT TOKENS: 424125
CACHED TOKENS: None
Thought tokens: 54
Total tokens: 424180
-------- Log for token count: 571115 attempt: 11 times: 1000------------------------
Is cached: True
INPUT TOKENS: 571127
CACHED TOKENS: 420566
Thought tokens: 100
Total tokens: 571227
-------- Log for token count: 791617 attempt: 12 times: 1500------------------------
Is cached: False
INPUT TOKENS: 791629
CACHED TOKENS: None
Thought tokens: 100
Total tokens: 791729
Token limit exceeded
Token count: 1085619
Cache hits: 6
Cache misses: 6
END OF LOG
script used to generate the log
The caching has improved a lot comparing the the log which i took on last saturday, not sure why the cache are missed in the subsequent request after a successful cache hit.