Context memory problem

Hello.

Subject of the issue: The model tends to focus on the user’s last message, and this isn’t just a Gemini problem, it’s rather an issue with all major models.

Detailed explanation: The model concentrates so much on the user’s last message that it practically ignores the context of the entire conversation, to the point where it can get stuck in loops and contradict itself.

Example 1: When the model is helping with coding and we reject its development/help suggestions in a given case, sometimes after a few or even just one message, it suggests the same solutions again. Of course, the model will apologize if we point out the repetition, but that’s even more irritating because we don’t expect apologies but rather correct operation.

Example 2: Many people use the Gemini model to help with writing, for example, stories or novels. I often watch on YouTube how a writer or amateur writer uses AI models to improve a chapter or plot thread of their novel. In the case of creative work with the model, situations arise where the model is so focused on the last message that it creates contradictions with previous fragments that it created or received from the user.

Consequences: When the user receives the same message 2 or 3 times, or when the user sees contradictions with the overall context, it leads to irritation and a change of model. Personally, I succumb to this because when the model suggests the same thing to me a second time, I know that I will find the solution I’m looking for faster with the competition.

Solution: Add a “context focus” option, so that the model gives the same weight to previous information as to the user’s last message. Or even perhaps give more weight to previous information, because when a user is creating, for example, a chapter of a novel, if they have moved on to the next plot thread, it can be assumed that they have approved the previous one, therefore the model must all the more create content consistent with what has already been done and not contradict it in new messages.

My previous suggestions regarding the model sticking to its role were included in LearnLM. Therefore, I hope that someone will read this too. The competition does not provide access to a playground for free, Google could in this way provide better conditions for people who expect something more from AI than just a regular chat. Thus, attracting new users.

4 Likes

Are you still experiencing the same issue with the newer model as well.

Unfortunately, yes. While it’s known that gemini-2.5-pro is simply an excellent model, perhaps the best one, there are unfortunately still some strange behaviors when it comes to context.

In coding, it’s hard to notice such subtle errors, but in creative writing it’s much more apparent.

  1. It constantly loses information, it’s a bit like a sieve.
  2. It can contradict itself, even within a single generated fragment.
  3. Strange breaking points between 110-140k tokens, where the model can generate complete absurdities, forgetting about the entire conversation context.

Describing the entire phenomenon would take me several hours, along with the presumed causes

For Gemini 2.5 Pro, the problem manifests in the middle part of the context. The start of the content and the end are effective; there is much less attention given to the middle part.

You can find more information on the “lost-in-the-middle” phenomenon in large language models (even YouTube videos) by entering the search “attention deficit in the middle of llm context window” in the Google search box. The AI Overview you get is informative enough.

2 Likes

Big thanks for the information. I’ll take a look at those things right away. I hope there’s some way to solve this problem. Because waiting several years for Gemini 6.0 will be terrible.

Unfortunately, I’m not able to independently determine how much this phenomenon disrupts the functionality. I would probably need an absurd amount of data and testing. Something like n=100,000. However, I was describing certain types of errors that are easy to catch. The model in the range of 110-140k tokens, but also as I’ve seen in recent days at 170k+, permanently loses its last responses. And its window is truncated to something like [n-1], [n-2] etc., as if it doesn’t include its last responses to the user.