Query: Gemini 2.0 Flash-Lite Explicit Caching Costs and Max TTL Limit

Hello everyone,

I’m developing a production application using the Gemini 2.0 Flash-Lite model (gemini-2.0-flash-lite-001) and I’m looking into the Explicit Context Caching feature for cost optimization.

I’ve reviewed the official documentation, and I understand that the cost of this cache is based on the amount of cached tokens and the duration (TTL). However, I haven’t been able to find the specific pricing details for the cache storage itself (not the cost of using the cached tokens in a prompt, but the cost of keeping them stored).

I need this information to accurately estimate my operational costs when scaling my application.

Could anyone in the community who has experience with this functionality please help clarify:

  1. What is the specific cost metric for storing cached tokens (e.g., cost per cached token per hour or day)?

  2. What is the maximum allowable duration (TTL) that can be set for an explicit CachedContent resource?

Any pointers to the right documentation section or personal experience with these costs would be greatly appreciated.

Thanks in advance for your help!

Best regards,

Carlos

Hi @Carlos_Orzabal, As mentioned in this documentation the context caching is not available for gemini-2.0-flash-lite. It is available for Gemini 2.flash & 2.5 models. You can check this document for pricing info. Thank You.

1 Like

Hello @Kiran_Sai_Ramineni,

Thank you again for your prompt response and the links to the documentation. I truly appreciate it!

I’m still a bit confused regarding the caching availability for gemini-2.0-flash-lite. In the official documentation under the properties listed for the models/gemini-2.0-flash-lite model, it explicitly states:

Caching: Supported

Given that you mentioned that context caching isn’t available for this model, could you please clarify what type of “Caching” the documentation is referring to as ‘Supported’ for gemini-2.0-flash-lite?

Also, while researching for production use, another question arose about the limits for gemini-2.0-flash-lite. We noticed that in the quotas documentation, while other models list a Requests Per Day (RPD) limit, for gemini-2.0-flash-lite the RPD column appears as “” across all levels.

Could you confirm whether this means there truly isn’t a daily limit (RPD) for gemini-2.0-flash-lite in the paid tiers, or if this daily limit information can be found in a different section?

Apologies for the multiple questions, and thank you again for your time and assistance in clarifying these points, which are important for planning production usage.

Best regards,

Carlos