Caching - can i cach less than the min mentioned?

Lior_Trieman · September 3, 2024, 6:38am

in the page is it mentioned: " The minimum input token count for context caching is 32,768,"

can i cach less than 32K? and then pay for the min? or else?
tnks!

Govind_Keshari · September 3, 2024, 9:37am

Hi @Lior_Trieman. Currently, you can’t cache less than 32,768 tokens. If you try with a lesser token you will get the below error :

BadRequest: 400 POST https://generativelanguage.googleapis.com/v1beta/cachedContents?%24alt=json%3Benum-encoding%3Dint: Cached content is too small.

Anil_Nagubadi · October 12, 2024, 12:31am

Is there any way for the Gemini to remember you past conversations. Buying 32K tokens is too much. I want the conversations to be rememberd

Jami_Bailey · December 20, 2024, 8:57am

There are two ways for Gemini to have memory outside of you storing it client side and sending it with every prompt.

One is caching is free for up to 1 million tokens stored per hour assuming your using 1.5 flash. If you exceed that storage, you will pay a fee beyond that. So the 32k minimum is a freebee (see here for reference: Preços da API Gemini | Google AI for Developers). You can specify the TTL in the API for how long to store. Saves a ton of money when you are using allot of in house custom data to reference during the conversation and you dont have to feed it in every single prompt.

Second is tuning your own model with a structured prompt using AI studio which is for much longer term memory. This will pretty much be static data that you dont plan on changing. This is currently same price as input pricing and the tuning service itself is free of charge.

albertorp · December 20, 2024, 12:07pm

Hello Jami. Can you give me a hint on how to create my own model and query it?
Thanks in advance!

dhruvkaushal · December 30, 2024, 12:55pm

This is very much required.

Jami_Bailey · February 11, 2025, 7:09am

In gemini AI studio, create a structured prompt with ateasy 20 example prompts to help it learn how to best provide the desired output your looking for. Then you will see an option to tune your own model which you will select this structured prompt to do.

After it’s created, when you click on it, it will have an ID. Use that id in your API where you usually reference the model_name

Topic		Replies	Views
Did My Vertex AI Input Caching Fail? Gemini API help-request , generative-ai	2	67	May 2, 2025
Does the OpenAI format support context caching? Gemini API open-models , ai	4	159	May 1, 2025
Context Cache Creation with Pro Model Variants Gemini API ai , model	1	96	November 27, 2024
Does model.startChat cache with the prompt? Gemini API api	2	61	February 21, 2025
Important question for developer about token use in context caching pricing Gemini API	0	91	September 5, 2024

Caching - can i cach less than the min mentioned?

Related topics