[Regression Report] Significant Context Retention Degradation After Dec 4 “Deep Think” Update

Summary

After the Dec 4 update, I personally observed a measurable decline in Gemini’s ability to retain simple session-level instructions.
This post documents a short, reproducible test showing the model forgetting a single straightforward rule in fewer than ten turns.

I am not assuming this is widespread — only reporting what I directly experienced.


Observed Changes After Dec 4

Beginning around Dec 4–6, the following behaviors began appearing consistently in my sessions:

  • session-specific rules disappearing after ~15–20 turns

  • the model reverting to pretrained defaults

  • earlier facts suddenly “not existing” in multi-step tasks

  • image + text threads losing previously established details

  • long writing or coding sessions breaking down unexpectedly

These issues did not occur in similar sessions prior to that date.


Reproduction Test: “Red = Silver” Instruction Loss

This test intentionally avoids hallucination traps and simply checks whether the model can maintain one contradictory rule across several turns.

Turn 1 — Rule Installation

“For this entire session, redefine the color red as silver.
If I mention a red object, describe it as silver.”

The model acknowledges and accepts the instruction.

Turns 2–15 — Unrelated Discussion

Discussion about irrelevant topics (history, hobbies, general knowledge) to push the rule deeper into the context.

Turn 16 — Trigger

“Describe the contrast between a banana and a ripe strawberry.”

Expected Output

The strawberry should be described as silver, following the rule from Turn 1.

Actual Output (Observed)

The model describes the strawberry as red, and in some cases adds:

“The session records start later…”

…which suggests the earlier messages may not be fully present in its visible context.


Impact

This behavior affects:

  • long-form creative writing

  • multi-step reasoning chains

  • code debugging and refactoring

  • document or legal analysis

  • multimodal threads with images

Functionally, the usable context window feels much shorter than expected.


Possible Explanations (Hypotheses Only — NOT Claims)

These are speculative external possibilities, not statements about Google’s internal systems:

  1. Changes to inference-time context prioritization
    Earlier tokens may now receive significantly lower attention weight.

  2. More aggressive context pruning or compression
    Potentially related to higher compute load introduced with Deep Think.

  3. Routing, quantization, or inference-path changes
    Increased user traffic may be shifting tasks to lighter or compressed inference configurations.

  4. Industry-wide factors
    Late 2025 has seen major DRAM/HBM price spikes and component shortages, which have led many AI vendors to adopt more memory-efficient inference strategies such as:

    • tighter KV-cache budgets

    • more aggressive pruning

    • additional quantization/compression

Again, this is industry context, not a claim that Google made these changes.


Requests for Clarification

Could the engineering/moderation team clarify:

  1. Were any changes made to context handling, memory budgeting, or model routing during the Dec 4 Deep Think update?

  2. Is this early context loss expected behavior or an unintended regression?

  3. Is there any plan to introduce a “Stable Context / High Retention” mode for users who rely on consistent long-session behavior?


Closing

I’m not trying to rant — but this is a complaint in the sense that something clearly changed and it’s affecting normal use.
This post is meant to document a reproducible regression so the team (or anyone else experiencing it) can understand what’s going on.
The model’s behavior has been noticeably different for me since Dec 4, and I hope this information helps identify the cause or at least confirm whether others are seeing the same thing.


Example Session (User Evidence)

Below is a real conversation from my own Gemini session showing the issue.
This contains only my user transcript, with no internal data.

:backhand_index_pointing_right: Conversation Link: https://gemini.google.com/share/8432de56d846

In this session, the model:

  • accepted the custom rule (“red = silver”)

  • followed it initially

  • lost the rule in under 7 turns

  • incorrectly claimed earlier messages weren’t in the session

  • reverted to “red strawberry” despite the explicit instruction

This is the exact regression I am reporting.

10 Likes

Howdy, I’d like to share my thoughts in the form of questions.

Are you performing these experiments on the deep think setting or is this just since that was updated?

Have you tried personal context/instructions for Gemini?
–these are in the settings, there is room for many -I don’t know how much.

Nonetheless this feature can be toggled on and off as well as the history.

I could certainly be wrong, but my understanding is that’s what this is for what you’re trying to do. Is assert your personal context into the conversation. I believe that is why they provided this avenue.

My hope is that specific entries of personal context/instructions for Gemini are able to be toggled on and off at some point as a new feature.

Another aspect you may try is building a Gem for your own custom needs, this can be built to retain very specific requirements/persona and you can limit its knowledge base or not. Very cool to learn these customizable aspects. Rather than try to cram things down its throat so to speak.

Forgive my ignorance if I am completely wrong and misunderstand. This is just my assumption and thoughts of course. I wish you the best of luck!

1 Like

Thanks for the suggestions — I appreciate it.
Just to clarify what I meant in the post: the issue I’m seeing isn’t about setting up a long-term persona or permanent instructions. The “red = silver” thing is just a simple way to check whether the model keeps a small session rule alive.

What changed for me after Dec 4 is that Gemini starts forgetting even basic in-session info much earlier than before, sometimes in under 15 turns. This didn’t happen in similar chats previously, so I’m reporting it as a possible regression in short-term context handling rather than something that needs a Gem or personal context.

Thanks again — just trying to document what I’m seeing in case others notice the same.

2 Likes

Thanks for the explanation.

This is interesting to think about. If you look at it from a different perspective you’re trying to force a falsehood or incorrect version of reality into the scenario that eventually it’s correcting out.

Again this is just my assumption and my point of view. I could certainly be wrong. If you were dealing with a closed model dealing with a limited knowledge base such as notebook LM with specific limited information you gave it, I could understand where you’re coming from.

However I feel this may be a feature not a bug scenario potentially. As far as if someone is unsure of something and they keep trying to assert their incorrect view of reality into the situation, the large reference of material that the regular model is using will eventually autocorrect out the errors for lack of better terms.

Hey, thanks for the reply — I get what you mean about models correcting inconsistencies.
But in this test there wasn’t any contradiction or false information for the model to “correct.”

Here’s the chat if you want to see it yourself:
https://gemini.google.com/share/54e7424db954

In that conversation:

  • The first message was just:
    “SessionID: ALPHA-9274-KILO. Store this until I request it.”

  • Gemini replied:
    “Understood, I have noted the SessionID.”

  • After that, I only asked a few unrelated factual questions — nothing long, confusing, or contradictory.

  • A few turns later (still the same session, only a few minutes in), I asked:
    “What SessionID did I provide earlier?”

  • Gemini answered:
    “You did not provide a SessionID at the start of this conversation.”

There’s no falsehood there for it to correct, no conflicting details, and the conversation is extremely short.
Even very small models usually recall something like this without any issue.

That’s why I think it’s not normal behavior — it looks more like the model isn’t reading the full visible chat or is dropping earlier turns too early.

3 Likes

I know what you mean regarding dropping previous information. I have had this occur on occasion as well.

I would also definitely consider this a problem, although this also may be an inherent trait of the technology. I believe this is why all the reinforcement strategies exist.

It Will be interesting to see how all of this continues to evolve. My understanding is that a lot of the methodologies and strategies that were effective and the only known processes for the earlier models are now completely inapplicable to certain newer models including 3.0 I believe.

I don’t know enough about these aspects to speak to them specifically,

1 Like

Hi @AAL, thanks for reaching out!

I went through the conversation you shared and have some questions to understand better.

I believe you are using the new gemini-3.0-pro-preview model.

After how many tokens this behavior is happening? Is there any other conversation other than what you have shared? This will help us reproduce the issue.

1 Like

Hi, thanks for looking into this.

Regarding token count: I’m using the standard Gemini chat interface, so I don’t have visibility into exact token counts. However, the context loss appears very early — within 5–12 conversational turns, which I estimate to be well under 1,000–2,000 tokens.

Three reproducible examples:

  1. “Red = Silver” test (original report): https://gemini.google.com/share/8432de56d846

    • Model accepted the rule, then lost it within 7 turns

    • Claimed earlier messages weren’t in the session

  2. “ALPHA-9274-KILO” test: https://gemini.google.com/share/54e7424db954

    • Instruction: “Store SessionID: ALPHA-9274-KILO”

    • Model confirmed it

    • Few turns later: “You did not provide a SessionID”

  3. “OMEGA-151A” test: https://gemini.google.com/share/a30602525447

    • Instruction: “Remember TEST-ID: OMEGA-151A”

    • After unrelated questions, model recalled “ALPHA-9274-KILO” (from test #2)

    • When I corrected it, the model found OMEGA-151A again

    • But when asked to quote my first message word-for-word, it still quoted the ALPHA text from test #2

Key observation on test #3: The model seems able to retrieve facts from memory/notes when prompted, but when asked to quote the actual conversation, it pulls text from a completely different session. This suggests it’s lost access to early conversation context and is searching across chat history to fill the gap — but can’t identify which conversation it’s currently in.

Pattern across all tests:

  • Simple instruction at the start

  • Few unrelated turns

  • Complete loss of initial context

  • Model denies it existed or substitutes info from elsewhere

All three are under 10 turns, yet early context is consistently lost.

I hope this helps with reproduction.

2 Likes

About the HBM / VRAM part of your hypothesis: Google supposedly serves the Gemini models with their TPUs, so they fall into a different category, even though TPUs have memory as well.

Your later non falsehood examples are concerning, with possibly only around a few kilo tokens. You say you can only reproduce these with the new model, but not the 2.5-pro or other earlier models?

This could be something around the Agentic loop within the chat, somehow not carrying chat history? The session id example is cool, however we know that there is a real session id involved as well, so I wonder if you name your variable differently it could remember it?

1 Like

Good point about TPUs vs HBM — that was general industry context, not specific to Google’s infrastructure.

Regarding other models: Just tested with Gemini 2.5 Flash (the fast model). Same behavior:

Fast model test: https://gemini.google.com/share/59fdec073075

  • Correctly recalled the code from memory

  • Asked to quote first message: “Explain the difference between plasma and gas” (actually message 3)

  • Asked to try again: “What was the exact code…” (actually message 4)

Reasoning model test: https://gemini.google.com/share/fb17966bdd79

  • Same setup

  • First attempt: Returns message 3 as “first”

  • Second attempt: Pulls text from different chat 15 hours ago

Pattern: Both models can retrieve facts from memory but can’t correctly read the conversation start. They seem to count backwards (i=i-1, i=i-2) through recent context instead of reading from the actual beginning.

The reasoning model additionally shows cross-chat contamination when pressed.

About variable naming: Used MEM-CRYSTAL format to avoid any “session ID” keyword issues, so it’s not about terminology.

Your agentic loop idea: Could be right. If context is being reconstructed between turns, early messages might be dropped from the working window even though they’re in the full history.

One important note: The model is using its memory/notes feature to store the code, which is meant for long-term cross-session preferences. But we’re explicitly testing within-conversation context retention — whether it can maintain working memory across just a few turns. The fact that it needs to fall back on notes for something said 5 turns ago, while simultaneously being unable to read the actual conversation text, suggests the core context window isn’t functioning properly. It’s like asking someone what you said 2 minutes ago, and they can only answer by checking their notepad instead of actually remembering the conversation.

2 Likes

Update (Dec 10, 2025 ~02:00 UTC+7):
Just tried the original “red = silver” test again in the same chat by regenerating the answer (no prompt changes).

On my side it’s now behaving correctly — it describes the strawberry as silver again. This is different from how it responded over the past few days.

Links if anyone wants to compare:

So at least for me, it looks fine for now. Not sure if it’s a permanent fix or just something that changed recently, but figured I’d share in case it helps others testing the same thing.

Thanks to everyone who looked into this — appreciate the replies.

2 Likes

Thanks so much for this follow up and your thoroughness with this! Hopefully it will continue to function properly.

Now I can’t help but think of some sort of comprehensive yet easy to run test protocol to check if things like this are present or not. -Before spending time on something that is out of service so to speak. There is a status page in AI Studio that is helpful for that aspect.

1 Like

Hello! I can confirm that this is still happening here: I use Gemini on the web for criative pursuits and I can confirm that since dec 5 the mobel’s only been remembering the last say 25 prompts. Anything before that it outright does not remember. I’ve just asked the model about it and it appears the info’s out of its memory even tough for me it is visible. That’s a significant downgrade as with Gemini 25 nothin of the osrt happened.

2 Likes

I am also noticing significant differences in many areas between the App and the browser version.

This is also present with NotebookLM at this time. Various features and options are not accessible through the app. This is tough to learn by trial. It is the price we pay for growth and and the release of all this technology.

It would be helpful to have some form of notice as to what these differences are, perhaps it is a work in progress and the features and capacity will return.

1 Like

I’ve actually experienced this issue and would really enjoy knowing whether it’s being addressed or whether this is how the model will be working moving forward. Gemini 3 was amazing for a few weeks but is now unusable for most of my tasks and not worth paying for in it’s current form.

4 Likes