Context Caching causes Gemini 1.5 to get stuck in a loop

Hi everyone,

Busy having fun Gemini, testing out the model capabilities, etc.
I was super excited by Context Caching becoming available, as my use case has a large input context with an equally large output context (around 140k tokens in, about 1.2x that out).

I implemented context caching into my test script and noticed a very weird issue: The output from the model seems to loop between requests.

My workflow is as follows:

  1. create cache with system_instruction and initial_content (being the task to perform).
  2. create model with GenerativeModel.from_cached_content using the prev. cache
  3. prompt the model to generate_content using a single simple message (since blank contents is not allowed)
  4. after each response, add the “model” output to a list of messages
  5. add a new user message which is basically “please continue”
  6. repeat from step 3 until the model includes a predefined phrase indicating it has completed its task

I am 100% sure step 4 and 5 are correctly adding messages to be passed to the model (dumped the input to a file and manually checked it).

The problem is: when step 3 runs again it produces nearly identical output, like the model isn’t considering the newly added messages from its previous output. Its stuck in a loop.

This only happens when using Context Caching, and it happens for both the pro and flash models.

Edit: Adding this:
I also tried not using a system_instruction, instead just modifying the initial message to the model to include my system prompt. The issue remains - when using caching the model gets stuck in a loop.

1 Like