Gemini 3 significantly worse thant 2.5 Pro at long context. Temperature likely to blame

JSON_B_Kidd · December 2, 2025, 10:27pm

Based on several use cases, Gemini-3-Pro-preview is demonstrating substantially inferior performance compared to Gemini 2.5 Pro specifically when processing long context content interactions - particularly large file uploaded content.

The degradation is likely attributable to it’s inability to perform at low temperature settings - this is likely the culprit affecting the model’s ability to think critically and sharply and maintain coherence and accuracy across long context windows.

Is anyone else experiencing this?

This is likely only noticeable to those who were using 2.5 Pro at low temperature settings before (0.2 or less, and particularly 0.05 or less).

Mrinal_Ghosh · December 3, 2025, 8:52am

Hi @JSON_B_Kidd ,

Welcome to the Forum!
Thank you for your feedback. We appreciate you taking the time to share your thoughts with us.

Model
To help us understand and resolve the issue you’re experiencing, please provide us with the steps you take that lead to the problem.

ONX_Universe.TW · December 6, 2025, 3:38pm

Subject: Re: Real-world data on iPhone 15 Pro Max confirms thermal throttling issues with long-context usage

Hi there,

I strongly agree with your hypothesis regarding temperature affecting performance. I just experienced a concrete example of this while stress-testing Gemini on mobile.

Device Specs:

• iPhone 15 Pro Max (A17 Pro Chip)

• Environment: Mobile App

Scenario:

I was conducting a high-complexity system architecture planning session (involving multi-layered logic, frequent role-switching, and a very long context window).

Observations:

1. Initial State: Response times were fast and logic was sharp (consistent with Gemini’s standard performance).

2. Degradation: As the conversation lengthened and the device temperature rose significantly (the phone became physically hot to the touch), I noticed a distinct spike in latency. The model seemed to struggle with complex logical retrieval.

3. Failure Point: Eventually, the device triggered its thermal protection mechanisms, causing the App to crash/flashback immediately during a response generation.

Conclusion:

This real-world “stress test” confirms that on mobile hardware, thermal throttling is a major bottleneck for Gemini’s advanced models (like 3 or 2.5 Pro) when handling long contexts. The hardware heat dissipation simply can’t keep up with the compute demands over extended sessions.

Just wanted to share this data point from the field!

R_S1 · December 8, 2025, 1:15am

Couldn’t agree more. I do a lot of general research then I end up creating a summary report. Then I do a ton of iterations because I ask more questions, reorganize content, manage change delete content.

Google Gemini is dropping content, getting off track, changing content (with no authorization) in one place while updating content somewhere else.

I’ll be honest, it’s become UNdependable. I have literally spent hours directing it to create a self check process related to authorization of change, validating itself regularly with every iteration… And it continues to ignore these directions on and off. It is consistently intermittent. Meaning it’s intermittent, but it happens with consistency.

It’s not cool when you can’t trust the AI to be accurate. Example document if you’re interested. g. co/ gemini/share/c20bdc8bfaed

tocsa · December 8, 2025, 3:55pm

By temperature the original poster probably didn’t mean device temperature, but rather model temperature, which is an input parameters to large language models and control the creative freedom the model could use to answer questions.

Matthew_Starks · December 9, 2025, 1:04am

I’m not a developer but Gemini 3 is telling me it has a fatal flaw and can’t be trusted. Kinda freaking me out. Lol

MBB · December 12, 2025, 1:42pm

I 100% agree here. The first week or two, Gemini 3.0 seemed leaps and bounds above 2.5. It was able to maintain long conversations, talk intelligently and recall items from much early in the conversation and I felt was a huge upgrade over 2.5. Unfortunately over the last week or so I feel like the bottom has dropped out and I can’t trust or use Gemini for any long term projects. Some items I’m seeing.

Chats will completely lose context and seems to completely forget earlier portions of the conversation relatively quickly. You can scroll back, see the prompt/response in question and Gemini will claim ignorance of the information. (See this thread, this is exactly what I’m seeing.)
Chats will randomly prune information very early on in the conversation from the context window, and even though you can still see the prompt/response in myactivity.google.com, if you scroll back up in the chat, they are completely gone. If you ask Gemini about the earlier prompts, it claims it doesn’t know what you’re talking about, it’s not in it’s context window.
30-40% of the time, if you add attachments and ask for some sort of analysis of that document/image/etc., the model reports back, but analyzes a previous attachment from earlier in the conversation. When you try to correct it, “You’re analyzing the wrong image, please look at the last attachment from my previous prompt.” half the time it will analyze a different attachment from earlier in the conversation.
In general, the model just seems ‘confused’ more often than not. Responding with answers that have no relationship to what you asked. As an example, I recently asked it to help me dial in some settings on the XSplit Vcam software, and it kept trying to tell me what settings to change in the XSplit Broadcaster software (which are 2 different apps).

I’m seeing this both in Fast and Thinking modes. I honestly loved 3.0 when it first came out and thought “This is it…this will be my tool going forward…” But it sounds like there was some large update around 12/4, and I really think something got majorly borked on the backend because it’s been a mess since then for me. Unfortunately, I don’t seem to see any way to roll back to 2.5, which would at least be better than what I’m getting from 3.0, and now since I can’t trust using the tool for anything but the most basic tasks and I’m actively looking at other models.

Jough_Donakowski · December 16, 2025, 5:15pm

Were you able to find anything reliable? I’ve started to hit a problem this week where what used to be routine files well within the context window are now not being fully “grabbed”, with notices that my file size is too large and the code is being truncated, preventing proper review. Very annoying since when Gemini 3 dropped it was a fantastic improvement in every way, and now it’s essentially unusable for anything I would want to actually spend money for.

MBB · December 17, 2025, 3:40pm

No, unfortunately, it seems like it’s somewhat ‘by design’.

Jough_Donakowski · December 17, 2025, 3:49pm

Hmmmm. Thanks. At least it’s not just me lol

That’s a big frustration. It’s definitely advertised to be able to handle things like 50,000+ lines of code, and as recently as last week seemed to be doing remarkable. This week, I’ve had serious issues with it truncating files I’ve uploaded and not scanning entire code. By ‘chunking’ a file into several pieces I was sort of able to get around this, but it seems like it’s coherence falls apart much quicker now. I don’t know what changed, but if this is how it’s going to be I’m not sure I want to be paying for it any more. It’s a real step back for what I was using it for.

Caio_Reberte · December 18, 2025, 1:36am

Hello, even after pro came out, this bug still persists. It has made gemini as a product unusable for my specific daily needs. That’s unfortunate as before the dec 4 update, gemini 3 was perfect. I really hope that htis’ll be fixed soon.

Mrinal_Ghosh · December 19, 2025, 8:47am

Hi all,

Thank you for bringing this to our attention. We truly appreciate you flagging this issue, we will file a bug internally.

Jough_Donakowski · December 19, 2025, 2:26pm

Thanks for the response and for looking into it. If it helps, here’s a screenshot of what I’m running into.

The entire file is something like 1,195 kb, which online estimators put at somewhere between 250k - 300k tokens.

If I estimate the tokens for the block where it stops reading the file, it comes out to just about 50,000 tokens.

Is it possible there’s some kind of 50k token input limit when reading text files? Only thing that comes to mind.

I would note prior to about a week and a half ago, Gemini was handling all this with ease, and Gemini 3 was a significant improvement over 2.5 (which was already pretty good). So this feels like a major step backwards.

Federico_SP · December 26, 2025, 5:03pm

I have to add that I’m noticing the same with Gemini 3, Confirming what other people have said in this thread.
It’s definitely losing information from the context.
I have cases about asking about something we just talked three prompts ago and the model having no idea what I’m talking about.
And trying to confirm and even writing explicitly part of the prompt that is supposed to be in the context because we talked about it just two messages before and it has no idea what I’m talking about.
It completely loses context information.
And that’s actually very serious. Because you actually cannot trust the model at all with this problem.

Lordmsryinium · January 1, 2026, 6:29pm

Yes please look in to it. With 2.5 i could do vary long stories with no problem some stories I work on regularly for months. Then 3.0 will forget details 10 promps in. If I made a story now made it 24 promps of a charectors day a bed to bed story by lunch the ai forgets and make up the early morning events. I had a story 2 5 remember every detail of then 3.0 forgets weeks of work after half a day.

Ray_Toshlyra · January 25, 2026, 10:38am

Yeah with Gemini 2.5 Pro i wrote a 90k Novel over 4 Sessions withing 7 days or so.
I could feed 30k words and then continue. there were a few errors but these were minor and with a little reminder my Gemini was able to see and fix them.

Since Gemini 3.0 Pro it feels like an old person with severe brain disorder (altsheimers?) after a few queires… Also hallucinates that i said “Enough Thinking” after some exchanges…When i ask where is aid that, the answer is that i did not…and then Gemini is sorry…

I am Subscribing…this is extra fishy to have performance cut down like that…i understand shit costs money… but its nearly impossiblw to work on bigger projects now…

Topic		Replies	Views
New Model Levels (Fast/Thinking/Pro) Continue to Be a Problem for Long Term Projects Google AI Studio ai-studio , feedback , gemini	9	3161	January 8, 2026
Gemini 3.0 Pro is ignoring my current prompts and repeating old answers in longer chats Google AI Studio feedback , prompt	19	1258	December 25, 2025
Gemini 3 review after 1 month: inconsistent at best, poor at worst compared to Gemini 2.5 Google AI Studio gemini-3	5	603	January 20, 2026
Context memory problem Google AI Studio models , llm	11	687	January 2, 2026
All Gemini goes wrong Google AI Studio feedback , bug , models	19	2684	December 27, 2025

Gemini 3 significantly worse thant 2.5 Pro at long context. Temperature likely to blame

Related topics