Hello everyone,
I’m developing a production application using the Gemini 2.0 Flash-Lite model (gemini-2.0-flash-lite-001
) and I’m looking into the Explicit Context Caching feature for cost optimization.
I’ve reviewed the official documentation, and I understand that the cost of this cache is based on the amount of cached tokens and the duration (TTL). However, I haven’t been able to find the specific pricing details for the cache storage itself (not the cost of using the cached tokens in a prompt, but the cost of keeping them stored).
I need this information to accurately estimate my operational costs when scaling my application.
Could anyone in the community who has experience with this functionality please help clarify:
-
What is the specific cost metric for storing cached tokens (e.g., cost per cached token per hour or day)?
-
What is the maximum allowable duration (TTL) that can be set for an explicit CachedContent
resource?
Any pointers to the right documentation section or personal experience with these costs would be greatly appreciated.
Thanks in advance for your help!
Best regards,
Carlos
Hi @Carlos_Orzabal, As mentioned in this documentation the context caching is not available for gemini-2.0-flash-lite. It is available for Gemini 2.flash & 2.5 models. You can check this document for pricing info. Thank You.
1 Like
Hello @Kiran_Sai_Ramineni,
Thank you again for your prompt response and the links to the documentation. I truly appreciate it!
I’m still a bit confused regarding the caching availability for gemini-2.0-flash-lite
. In the official documentation under the properties listed for the models/gemini-2.0-flash-lite
model, it explicitly states:
Caching: Supported
Given that you mentioned that context caching isn’t available for this model, could you please clarify what type of “Caching” the documentation is referring to as ‘Supported’ for gemini-2.0-flash-lite
?
Also, while researching for production use, another question arose about the limits for gemini-2.0-flash-lite
. We noticed that in the quotas documentation, while other models list a Requests Per Day (RPD) limit, for gemini-2.0-flash-lite
the RPD column appears as “–” across all levels.
Could you confirm whether this means there truly isn’t a daily limit (RPD) for gemini-2.0-flash-lite
in the paid tiers, or if this daily limit information can be found in a different section?
Apologies for the multiple questions, and thank you again for your time and assistance in clarifying these points, which are important for planning production usage.
Best regards,
Carlos
Hi @Carlos_Orzabal, Apologies for the confusion, I have tested this context caching feature using 2.0 flash lite model with a sample code, I can see this context caching feature is working with Gemini 2.0 flash lite model. I think the document needs to be updated, will create a CL this document update.
Regarding the rate limits for paid tire for RPD will check with the engineering team. Thank You.
1 Like
Hi @Kiran_Sai_Ramineni,
No problem at all, thanks for clarifying!
Just to confirm, does this mean the context caching feature in 2.0 Flash Lite works exactly the same way as in the standard Gemini 1.0 Flash model? That would be great news.
Regarding the rate limits for paid tiers (RPD), I’ll be looking forward to your update from the engineering team.
Thanks a lot for your help and for taking the time to test this!