My prompt consists of a fixed part of about 3000 tokens and a variable part of about 1000 tokens.
In this case, can I benefit from cache hits and enjoy API cost discounts?
Hi @hong_jackey
Welcome to the AI Forum!
Yes, Your fixed 3000 tokens content can be cached, The variable 1000 tokens part will always be charged normally. To get the most savings, make sure the fixed part comes first in your prompt.
For more details information, please refer to this doc.