Critical bug: Vertex API with context cache leaks prompt state between generateContent calls

Adrian_Cable · November 6, 2025, 5:32pm

We have found what we believe is a serious bug with using explicit context caches together with prompts containing images with the Vertex API. We have a complex system with moderate size (~ 5000 token) prompts that we are moving from the Gemini API to the Vertex API, and found that post-migration, generate_content calls were producing ‘nonsensical’ responses that seemed to apply to prompts we provided at a previous time.

We have put together a reproducer in Python that’s as simple as possible (~ 100 lines of code) which reproduces this bug 100% of the time. It first makes an explicit context cache containing ~ 2500 tokens (‘lorem ipsum’ repeated). It makes a generate_content call with an image of a cat in the prompt, with the prompt asking to identify it. It then makes another generate_content call with an image of a dog in the prompt , with a prompt asking to identify it. When the Vertex API is used with a context cache (contents of the cache don’t matter, it just has to be present), the second generate_content call produces what looks like a response to the earlier prompt (i.e. it says ‘cat’ even though it is prompted to produce ‘dog’). Correct responses are produced by Vertex API when context cache is not used, and by Gemini API both with and without context cache.

Observations:

Issue seems to be somewhat region-dependent. It fails every time on us-central1 but only sometimes on ‘global’.
Issue only seems to happen with prompts containing images (and only for images above a certain size).
The reproducer uses gemini-2.5-flash-lite but the issue occurs with any model.
We have examined the REST API calls made and they seem to contain the correct data, so this looks like a back end issue not an SDK issue

I would appreciate you looking into this. This is a blocker for us because it means we can’t use Vertex API and context caching together.

Steps to reproduce

The reproducer demonstrates things working in the Gemini API and breaking in the Vertex API, so to run it you need a Gemini API key and a project/service account with Vertex AI API enabled. Set up these environment variables:

GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service-account-keys.json
MY_VERTEX_PROJECT_ID=project-id
MY_GEMINI_API_KEY=AIza...

Then unpack and run the attachment (contains reproducer.py, cat.jpg and dog.jpg):

tar xvfz reproducer.tar.gz
python reproducer.py

Reproducer prints:

Model is prompted with a cat image only and asked to identify, then separately prompted with a dog image only and asked to identify.
Each test should print 'Cat' followed by 'Dog'.

Testing with Gemini API (no context cache):
Cat
Dog

Testing with Vertex API (no context cache):
Cat
Dog

Testing with Gemini API (using context cache):
Cat
Dog

Testing with Vertex API (using context cache):
Cat
Cat

Should be Cat/Dog every time - no reference to a cat is made in the 2nd generate_content call each time, indicating that in the Vertex AI / context cache test case, state is somehow leaking from the 1st generate_content call, which sounds pretty serious.

reproducer.tar.gz

Thanks so much in advance for looking into this!

-Adrian

ascillitoe · November 10, 2025, 5:33pm

+1. We have also been experiencing this recently. We see this problem even when images are not present in the prompt, but other file/mime types such as pdf’s are.

This seems like a critical issue?!

Siddharth_Naik · December 29, 2025, 8:04am

Hello,

Thank you for providing the reproduction code. After running the code you shared, we were unable to reproduce the issue, and the output appears as expected for all 2.0 and 2.5 models. Please let us know if you are still facing this issue.

Topic		Replies	Views
Image caching leads to wrong behavior of gemini? Gemini API gemini-flash , context_caching	4	268	December 1, 2025
Gemini 2.5 Flash Lite: Implicit Caching Not Working Despite Meeting Documented Requirements Gemini API bug , gemini	1	407	March 4, 2026
Gemini 2.5 Flash implicit caching problem Gemini API api , context_caching	5	760	March 4, 2026
Why does Gemini 3 Pro Image "remember" previous sessions? (please help) Gemini API ai-studio , bug , api , gemini , gemini-3	0	204	December 16, 2025
Unreliability of Gemini API - Error while creating cache Gemini API bug	31	1136	February 25, 2025

Critical bug: Vertex API with context cache leaks prompt state between generateContent calls

Steps to reproduce

Related topics