Query: Gemini 2.0 Flash-Lite Explicit Caching Costs and Max TTL Limit

Carlos_Orzabal · May 16, 2025, 5:27am

Hello everyone,

I’m developing a production application using the Gemini 2.0 Flash-Lite model (gemini-2.0-flash-lite-001) and I’m looking into the Explicit Context Caching feature for cost optimization.

I’ve reviewed the official documentation, and I understand that the cost of this cache is based on the amount of cached tokens and the duration (TTL). However, I haven’t been able to find the specific pricing details for the cache storage itself (not the cost of using the cached tokens in a prompt, but the cost of keeping them stored).

I need this information to accurately estimate my operational costs when scaling my application.

Could anyone in the community who has experience with this functionality please help clarify:

What is the specific cost metric for storing cached tokens (e.g., cost per cached token per hour or day)?
What is the maximum allowable duration (TTL) that can be set for an explicit CachedContent resource?

Any pointers to the right documentation section or personal experience with these costs would be greatly appreciated.

Thanks in advance for your help!

Best regards,

Carlos

Kiran_Sai_Ramineni · May 16, 2025, 6:51am

Hi @Carlos_Orzabal, As mentioned in this documentation the context caching is not available for gemini-2.0-flash-lite. It is available for Gemini 2.flash & 2.5 models. You can check this document for pricing info. Thank You.

Carlos_Orzabal · May 16, 2025, 4:52pm

Hello @Kiran_Sai_Ramineni,

Thank you again for your prompt response and the links to the documentation. I truly appreciate it!

I’m still a bit confused regarding the caching availability for gemini-2.0-flash-lite. In the official documentation under the properties listed for the models/gemini-2.0-flash-lite model, it explicitly states:

Caching: Supported

Given that you mentioned that context caching isn’t available for this model, could you please clarify what type of “Caching” the documentation is referring to as ‘Supported’ for gemini-2.0-flash-lite?

Also, while researching for production use, another question arose about the limits for gemini-2.0-flash-lite. We noticed that in the quotas documentation, while other models list a Requests Per Day (RPD) limit, for gemini-2.0-flash-lite the RPD column appears as “–” across all levels.

Could you confirm whether this means there truly isn’t a daily limit (RPD) for gemini-2.0-flash-lite in the paid tiers, or if this daily limit information can be found in a different section?

Apologies for the multiple questions, and thank you again for your time and assistance in clarifying these points, which are important for planning production usage.

Best regards,

Carlos

Kiran_Sai_Ramineni · May 20, 2025, 5:32am

Hi @Carlos_Orzabal, Apologies for the confusion, I have tested this context caching feature using 2.0 flash lite model with a sample code, I can see this context caching feature is working with Gemini 2.0 flash lite model. I think the document needs to be updated, will create a CL this document update.

Regarding the rate limits for paid tire for RPD will check with the engineering team. Thank You.

Carlos_Orzabal · May 20, 2025, 6:30am

Hi @Kiran_Sai_Ramineni,

No problem at all, thanks for clarifying!

Just to confirm, does this mean the context caching feature in 2.0 Flash Lite works exactly the same way as in the standard Gemini 1.0 Flash model? That would be great news.

Regarding the rate limits for paid tiers (RPD), I’ll be looking forward to your update from the engineering team.

Thanks a lot for your help and for taking the time to test this!

Topic		Replies	Views
Gemini 2.0 does not support cached contents, but 1.5-flash-002 to be discontinued Gemini API api , gemini-flash	6	291	April 17, 2025
Context cache not available for Gemini 2.0 Flash free tier? Gemini API api , gemini-flash	1	35	June 3, 2025
Context Caching REAL Computed Time Gemini API	2	78	August 18, 2024
About context caching? Gemini API gemini-flash	1	110	March 5, 2025
Context Caching Price Documentation api , gemini-api , gemini , gemini-flash	3	103	March 5, 2025

Query: Gemini 2.0 Flash-Lite Explicit Caching Costs and Max TTL Limit

Related topics