in the page is it mentioned: " The minimum input token count for context caching is 32,768,"
can i cach less than 32K? and then pay for the min? or else?
tnks!
in the page is it mentioned: " The minimum input token count for context caching is 32,768,"
can i cach less than 32K? and then pay for the min? or else?
tnks!
Hi @Lior_Trieman. Currently, you can’t cache less than 32,768 tokens. If you try with a lesser token you will get the below error :
BadRequest: 400 POST https://generativelanguage.googleapis.com/v1beta/cachedContents?%24alt=json%3Benum-encoding%3Dint: Cached content is too small.
Is there any way for the Gemini to remember you past conversations. Buying 32K tokens is too much. I want the conversations to be rememberd
There are two ways for Gemini to have memory outside of you storing it client side and sending it with every prompt.
One is caching is free for up to 1 million tokens stored per hour assuming your using 1.5 flash. If you exceed that storage, you will pay a fee beyond that. So the 32k minimum is a freebee (see here for reference: Preços da API Gemini | Google AI for Developers). You can specify the TTL in the API for how long to store. Saves a ton of money when you are using allot of in house custom data to reference during the conversation and you dont have to feed it in every single prompt.
Second is tuning your own model with a structured prompt using AI studio which is for much longer term memory. This will pretty much be static data that you dont plan on changing. This is currently same price as input pricing and the tuning service itself is free of charge.
Hello Jami. Can you give me a hint on how to create my own model and query it?
Thanks in advance!
This is very much required.