How to reset Google Gemini Live model context window

javimp03gem · November 6, 2025, 8:58am

Hi everyone!

Im getting some troubles when using gemini live models (specifically gemini-25.-flash-live version) due to its built-in context window. Basically my setup is the following one: Im building an in-call AI based assistant that uses gemini live to produce realtime answers for the questions the client does. It could be the only AI engine in the system, but the problem appears when gemini live model produce crazy answers or just produce some outputs that are not aligned with the instructions it received in its system prompt.

In order to fix that, I have implemented a 2 websocket approach. My microservice uses 1 websocket for the assistant and 1 websocket for a gemini live acting as an evaluator. The evaluator basically takes the client audio input and the assistant output transcription (text) and has to produce an evaluation of whether the assistant response was good or not. You can notice that the first websocket RESPONSE_MODALITIES is audio and the second websocket REPONSE_MODALITIES is text (because the evaluator has to produce a json schema as response).

The problem im getting is the following. The assistant gemini live model is receiving the system instruction (text) and the client audio input and it is responding properly (not always but mostly), but the evaluator gemini live is not performing as expected. It receives the system instruction (text), the client audio input and the assistant output (text), but it is producing JSON evaluations for past previous iterations. For example, I mean, when the conversation starts I can tell the system “Hello, help me find a laptop on internet”, the assistant could give me step by step instructions to achieve my goal (correct) and the evaluator would evaluate it properly, maybe outputing {“correct”:True, “reason”:“The client asked for a laptop on internet and the assistant correctly conducted it in a step by step follow up…”}. After some interations, the client says “Now, I want you to tell me who is Lionel Messi.”, the assistant would produce “Yes! Lionel Messi is a really famous footballer…” and the evaluator would response badly {“correct”:True, “reason”:“As the client has requested for a laptop on internet, the assistant did what it had to and correctly conducted it in a step by step follow up…”}.

As you can see, it seems that the model is focusing on the context it has, and I dont know if the problem comes from (1) the way im sending the inputs to the gemini model evaluator or (2) the way gemini live treats its context window. One approach I think would work is cleaning the model context, but I think this cannot be done for this kind of model, doesnt it? I would really appreciate you to ask me any questions to better understand my use case, as well as propose solutions.

Thanks in advance!

Sonali_Kumari1 · November 21, 2025, 9:01am

Hi @javimp03gem , Thanks for reaching out to us!

To understand this issue better, could you please clarify whether you are reusing the single Evaluator WebSocket connection for the entire call, or are you opening a fresh WebSocket connection for each individual turn ?

Topic		Replies	Views
Gemini Flash Live API: How to ensure the model always uses the latest user-provided context after a sequence of context + audio turns? Gemini API model-code , gemini-flash	0	155	May 19, 2025
New streaming models not performing well with code Gemini API audio , live-streaming , gemini-flash-2-5	6	195	September 3, 2025
Thinking output on gemini-live-2.5-flash-preview model Gemini API gemini , live-streaming	2	200	November 25, 2025
The 1m context window lie Gemini API gemini-flash	7	1133	July 2, 2025
Inconsistent Response Behavior in gemini-2.5-flash-native-audio-preview-09-2025 Voicebot Gemini API ai-studio , live-streaming	4	414	December 7, 2025

How to reset Google Gemini Live model context window

Related topics