Hi everyone,
Busy having fun Gemini, testing out the model capabilities, etc.
I was super excited by Context Caching becoming available, as my use case has a large input context with an equally large output context (around 140k tokens in, about 1.2x that out).
I implemented context caching into my test script and noticed a very weird issue: The output from the model seems to loop between requests.
My workflow is as follows:
- create cache with
system_instruction
andinitial_content
(being the task to perform). - create model with
GenerativeModel.from_cached_content
using the prev. cache - prompt the model to
generate_content
using a single simple message (since blank contents is not allowed) - after each response, add the “model” output to a list of messages
- add a new user message which is basically “please continue”
- repeat from step 3 until the model includes a predefined phrase indicating it has completed its task
I am 100% sure step 4 and 5 are correctly adding messages to be passed to the model (dumped the input to a file and manually checked it).
The problem is: when step 3 runs again it produces nearly identical output, like the model isn’t considering the newly added messages from its previous output. Its stuck in a loop.
This only happens when using Context Caching, and it happens for both the pro and flash models.
Edit: Adding this:
I also tried not using a system_instruction
, instead just modifying the initial message to the model to include my system prompt. The issue remains - when using caching the model gets stuck in a loop.