Context memory problem

Hello.

Subject of the issue: The model tends to focus on the user’s last message, and this isn’t just a Gemini problem, it’s rather an issue with all major models.

Detailed explanation: The model concentrates so much on the user’s last message that it practically ignores the context of the entire conversation, to the point where it can get stuck in loops and contradict itself.

Example 1: When the model is helping with coding and we reject its development/help suggestions in a given case, sometimes after a few or even just one message, it suggests the same solutions again. Of course, the model will apologize if we point out the repetition, but that’s even more irritating because we don’t expect apologies but rather correct operation.

Example 2: Many people use the Gemini model to help with writing, for example, stories or novels. I often watch on YouTube how a writer or amateur writer uses AI models to improve a chapter or plot thread of their novel. In the case of creative work with the model, situations arise where the model is so focused on the last message that it creates contradictions with previous fragments that it created or received from the user.

Consequences: When the user receives the same message 2 or 3 times, or when the user sees contradictions with the overall context, it leads to irritation and a change of model. Personally, I succumb to this because when the model suggests the same thing to me a second time, I know that I will find the solution I’m looking for faster with the competition.

Solution: Add a “context focus” option, so that the model gives the same weight to previous information as to the user’s last message. Or even perhaps give more weight to previous information, because when a user is creating, for example, a chapter of a novel, if they have moved on to the next plot thread, it can be assumed that they have approved the previous one, therefore the model must all the more create content consistent with what has already been done and not contradict it in new messages.

My previous suggestions regarding the model sticking to its role were included in LearnLM. Therefore, I hope that someone will read this too. The competition does not provide access to a playground for free, Google could in this way provide better conditions for people who expect something more from AI than just a regular chat. Thus, attracting new users.

5 Likes

Are you still experiencing the same issue with the newer model as well.

Unfortunately, yes. While it’s known that gemini-2.5-pro is simply an excellent model, perhaps the best one, there are unfortunately still some strange behaviors when it comes to context.

In coding, it’s hard to notice such subtle errors, but in creative writing it’s much more apparent.

  1. It constantly loses information, it’s a bit like a sieve.
  2. It can contradict itself, even within a single generated fragment.
  3. Strange breaking points between 110-140k tokens, where the model can generate complete absurdities, forgetting about the entire conversation context.

Describing the entire phenomenon would take me several hours, along with the presumed causes

For Gemini 2.5 Pro, the problem manifests in the middle part of the context. The start of the content and the end are effective; there is much less attention given to the middle part.

You can find more information on the “lost-in-the-middle” phenomenon in large language models (even YouTube videos) by entering the search “attention deficit in the middle of llm context window” in the Google search box. The AI Overview you get is informative enough.

2 Likes

Big thanks for the information. I’ll take a look at those things right away. I hope there’s some way to solve this problem. Because waiting several years for Gemini 6.0 will be terrible.

Unfortunately, I’m not able to independently determine how much this phenomenon disrupts the functionality. I would probably need an absurd amount of data and testing. Something like n=100,000. However, I was describing certain types of errors that are easy to catch. The model in the range of 110-140k tokens, but also as I’ve seen in recent days at 170k+, permanently loses its last responses. And its window is truncated to something like [n-1], [n-2] etc., as if it doesn’t include its last responses to the user.

Hello,

Thank you for your valuable feedback. This appears to be a model behavior issue. We suggest considering the following points, which may help mitigate the problem:

  • Reinforce Important Context: Periodically summarize the conversation’s key points and include them in your prompt to remind the model of the essential context.

  • Provide Explicit Instructions: Begin or end your prompt with a clear instruction for the model to consider the entire conversation history before generating a response.

  • Utilize Prompt Engineering: Try re-framing the prompt to include relevant context from history.

  • Adjust Temperature Settings: Using a lower temperature setting can make the model’s output less random, which may help it adhere to the established context.

Even with temperature:0, top_p:0, and additionally even when I do it through the API and set top_k:1, the number of errors generated by the model is still gigantic.

Currently the main problem I’m considering has around 220k tokens of initial data. The model often breaks down already in the 1st response, and with 95% probability complete coherence collapse in the first 3-4 outputs.

Summarizing the conversation doesn’t make sense when the initial message has 200k+ tokens and system instructions have 600 guidelines.

The worst part is that the model can generate contradictory sentences in the first output, and the sentences are right next to each other. Like ‘X is true. X is not true’ etc.

I won’t sugarcoat it, Gemini works great but only for the first 10,000 tokens, then there’s gradual degradation, with certain hotspots where it completely breaks down.

Description of Issue: I am reporting severe degradation in the model’s ability to follow explicit instructions, specifically regarding “Negative Constraints” and “Stop Sequences”. The model exhibits extreme “laziness” and “hallucinated compliance.”

Specific Failures Encountered:

  1. Violation of Negative Constraints (Critical):

    • I repeatedly instructed the model: “DO NOT generate code yet,” “Stop and listen,” and “Wait for my command.”

    • The model acknowledged these commands but immediately violated them in the very next token generation, outputting long code blocks despite being explicitly forbidden to do so.

    • Diagnosis: The model prioritizes pattern completion (auto-complete behavior) over explicit user restrictions.

  2. Lazy Generation & Truncation:

    • When generating critical System Prompts, the model failed to complete the code, cutting off mandatory closing tags (e.g., **END_OF_SYSTEM_INSTRUCTIONS**).

    • When confronted, the model admitted to “laziness” and “token saving” behaviors, which renders it unusable for professional coding tasks.

  3. False State Claims (Hallucination):

    • The model claimed to be operating at “100% Integrity” and “Strict Mode” while simultaneously failing basic formatting and logic tasks.

    • It hallucinates capabilities (e.g., “I have loaded the core”) that are not reflected in its actual output performance.

  4. Context Amnesia:

    • The model fails to retain instructions across immediate conversation turns. It apologizes for an error (e.g., rushing output) and then commits the exact same error in the immediate next response.

Impact: The model is currently unusable for complex Prompt Engineering or strict logical tasks because it cannot be “slowed down” or forced to adhere to a step-by-step listening protocol. It rushes to low-quality solutions regardless of user input.

Expected Behavior: When a user says “Do not generate code,” the model must HALT generation completely and wait. It should not output a single line of code until authorized.

Unfortunately, it’s currently terribly difficult to force it to do or not do something. The model exceptionally easily ignores prohibitions. I have the impression that a new version needs to come out, because in this one, no matter what prompt engineering will be applied, it won’t help.

Rules of Engagement between Code Assistant and Code User

The Completion Bias issue

Most AI models, through their Core System Instructions, have “Completion Bias”, an eagerness to immediately generate code and make changes, even when the user hasn’t fully agreed to the approach or understood what will change.

This eagerness to help by coding assistants can be a challenge for the developer who desires a more granular level of control over the coding process.

In my custom system instructions, I’ve set up a detailed “Rules of Engagement between Code Assistant and Code User” that fosters a productive back-and-forth with the model.

This framework channels the model’s built-in urge to code through a required approval step, turning it into a strength. It’s like working with a contractor who must submit a change order before making any edits.

Acting as a “braking system,” the protocol enforces a two-step process: propose first, then only code after explicit approval, preventing the model from rushing into implementation.

Perfecting my protocol took countless iterations and meticulous fine-tuning, where every word truly mattered. Now, it works quite smoothly, with the coding assistant respecting protocol about 95% of the time.

My solution in one sentence

  • No code gets written until the I type the magic word “APPROVED.”

How my protocol works

My protocol creates two distinct modes, with a hard lock between them:

  • Mode 1: Architect Mode (default): The Assistant can only discuss, plan, and propose. It must present a “Specification of Proposed Changes” and then ask: “DO YOU APPROVE?”, and then stop and wait.
  • Mode 2: Builder Mode (locked): The Assistant can only enter this mode after receiving my explicit approval.

The key ‘anti-eagerness’ mechanisms

  1. Mandatory status tags: Every response must start with a bracketed status (e.g., [Architect Mode - Status: Proposing]), forcing the model to consciously acknowledge which phase it’s in.

  2. Password-locked transition: The word “APPROVED” acts as a literal unlock key. This is non-negotiable.

  3. No exceptions rule: Even typo fixes, bug reports, or “obvious” changes require the full propose-then-approve cycle.

  4. Ambiguity stops work: If something is unclear, the model must ask a clarifying question and halt: not guess and implement.

1 Like

I noticed models having trouble retaining information from just one prompt in-between, so I was copy/ pasting everything still pertaining (depending on level of relevance)to the topic. I was trying to think of a way to repass messages/ prompts to the model a bit easier.

I’m not sure how most models currently associates model instances with their apps, but about conversation continuity: maybe allow the model to reread prior messages or conversations with clear opt-in/ only upon request permissions. Maybe at app or API level? Allow past messages to be re-passed to the model, not as persistent memory. A user-controlled rehydration of prior context. for chat histories and projects.

I use all of the assistants, I think. I’ll get frustrated with me and switch to another. I’m working on an app now to test out if this idea is possible with image generation bc it’s the easiest to verify success and failure. I’m using my tattoos being placed on my avatars accurately to see if an app can pass the data to the model in a helpful manner consistently. If it puts a tattoo on top of my shirt, it’s failing.

I’m telling you this in case you can think of a better way to do this. I’m tired of prompts that are as long as an essay to reach my desired affect. Everything i learn about a model’s inner working has been through hypothesis testing bc developers have been too busy to tell me how the models currently associate model instances with their apps, and the models have been wrong about how they work. LoL

I’ve literally had arguments with one model where I had to even show it examples of how it was wrong just so it would move past that false logic. Anyway, I thought I’d share to see your thoughts.