Context memory problem

Dawid_M · November 26, 2024, 12:14pm

Hello.

Subject of the issue: The model tends to focus on the user’s last message, and this isn’t just a Gemini problem, it’s rather an issue with all major models.

Detailed explanation: The model concentrates so much on the user’s last message that it practically ignores the context of the entire conversation, to the point where it can get stuck in loops and contradict itself.

Example 1: When the model is helping with coding and we reject its development/help suggestions in a given case, sometimes after a few or even just one message, it suggests the same solutions again. Of course, the model will apologize if we point out the repetition, but that’s even more irritating because we don’t expect apologies but rather correct operation.

Example 2: Many people use the Gemini model to help with writing, for example, stories or novels. I often watch on YouTube how a writer or amateur writer uses AI models to improve a chapter or plot thread of their novel. In the case of creative work with the model, situations arise where the model is so focused on the last message that it creates contradictions with previous fragments that it created or received from the user.

Consequences: When the user receives the same message 2 or 3 times, or when the user sees contradictions with the overall context, it leads to irritation and a change of model. Personally, I succumb to this because when the model suggests the same thing to me a second time, I know that I will find the solution I’m looking for faster with the competition.

Solution: Add a “context focus” option, so that the model gives the same weight to previous information as to the user’s last message. Or even perhaps give more weight to previous information, because when a user is creating, for example, a chapter of a novel, if they have moved on to the next plot thread, it can be assumed that they have approved the previous one, therefore the model must all the more create content consistent with what has already been done and not contradict it in new messages.

My previous suggestions regarding the model sticking to its role were included in LearnLM. Therefore, I hope that someone will read this too. The competition does not provide access to a playground for free, Google could in this way provide better conditions for people who expect something more from AI than just a regular chat. Thus, attracting new users.

Shrushti_Patil · August 12, 2025, 7:34am

Are you still experiencing the same issue with the newer model as well.

Dawid_M · August 12, 2025, 10:25am

Unfortunately, yes. While it’s known that gemini-2.5-pro is simply an excellent model, perhaps the best one, there are unfortunately still some strange behaviors when it comes to context.

In coding, it’s hard to notice such subtle errors, but in creative writing it’s much more apparent.

It constantly loses information, it’s a bit like a sieve.
It can contradict itself, even within a single generated fragment.
Strange breaking points between 110-140k tokens, where the model can generate complete absurdities, forgetting about the entire conversation context.

Describing the entire phenomenon would take me several hours, along with the presumed causes

OrangiaNebula · August 12, 2025, 8:41pm

For Gemini 2.5 Pro, the problem manifests in the middle part of the context. The start of the content and the end are effective; there is much less attention given to the middle part.

You can find more information on the “lost-in-the-middle” phenomenon in large language models (even YouTube videos) by entering the search “attention deficit in the middle of llm context window” in the Google search box. The AI Overview you get is informative enough.

Dawid_M · August 12, 2025, 9:35pm

Big thanks for the information. I’ll take a look at those things right away. I hope there’s some way to solve this problem. Because waiting several years for Gemini 6.0 will be terrible.

Dawid_M · August 13, 2025, 12:39am

Unfortunately, I’m not able to independently determine how much this phenomenon disrupts the functionality. I would probably need an absurd amount of data and testing. Something like n=100,000. However, I was describing certain types of errors that are easy to catch. The model in the range of 110-140k tokens, but also as I’ve seen in recent days at 170k+, permanently loses its last responses. And its window is truncated to something like [n-1], [n-2] etc., as if it doesn’t include its last responses to the user.

Topic		Replies	Views
Aistudio welcome page with "Our 2M token context window" text Google AI Studio feedback , models	4	367	June 5, 2025
Critically poor performance of the latest gemini-2.5 model Google AI Studio models	5	1148	July 6, 2025
I'm not having fun. An internal error has occurred Google AI Studio models	6	1154	January 9, 2025
The 1m context window lie Gemini API gemini-flash	7	910	July 2, 2025
New Model is impossible to use Google AI Studio gemini-15 , models	3	481	October 3, 2024

Context memory problem

Related topics