Based on several use cases, Gemini-3-Pro-preview is demonstrating substantially inferior performance compared to Gemini 2.5 Pro specifically when processing long context content interactions - particularly large file uploaded content.
The degradation is likely attributable to it’s inability to perform at low temperature settings - this is likely the culprit affecting the model’s ability to think critically and sharply and maintain coherence and accuracy across long context windows.
Is anyone else experiencing this?
This is likely only noticeable to those who were using 2.5 Pro at low temperature settings before (0.2 or less, and particularly 0.05 or less).
Subject: Re: Real-world data on iPhone 15 Pro Max confirms thermal throttling issues with long-context usage
Hi there,
I strongly agree with your hypothesis regarding temperature affecting performance. I just experienced a concrete example of this while stress-testing Gemini on mobile.
Device Specs:
• iPhone 15 Pro Max (A17 Pro Chip)
• Environment: Mobile App
Scenario:
I was conducting a high-complexity system architecture planning session (involving multi-layered logic, frequent role-switching, and a very long context window).
Observations:
1. Initial State: Response times were fast and logic was sharp (consistent with Gemini’s standard performance).
2. Degradation: As the conversation lengthened and the device temperature rose significantly (the phone became physically hot to the touch), I noticed a distinct spike in latency. The model seemed to struggle with complex logical retrieval.
3. Failure Point: Eventually, the device triggered its thermal protection mechanisms, causing the App to crash/flashback immediately during a response generation.
Conclusion:
This real-world “stress test” confirms that on mobile hardware, thermal throttling is a major bottleneck for Gemini’s advanced models (like 3 or 2.5 Pro) when handling long contexts. The hardware heat dissipation simply can’t keep up with the compute demands over extended sessions.
Just wanted to share this data point from the field!
Couldn’t agree more. I do a lot of general research then I end up creating a summary report. Then I do a ton of iterations because I ask more questions, reorganize content, manage change delete content.
Google Gemini is dropping content, getting off track, changing content (with no authorization) in one place while updating content somewhere else.
I’ll be honest, it’s become UNdependable. I have literally spent hours directing it to create a self check process related to authorization of change, validating itself regularly with every iteration… And it continues to ignore these directions on and off. It is consistently intermittent. Meaning it’s intermittent, but it happens with consistency.
It’s not cool when you can’t trust the AI to be accurate. Example document if you’re interested. g. co/ gemini/share/c20bdc8bfaed
By temperature the original poster probably didn’t mean device temperature, but rather model temperature, which is an input parameters to large language models and control the creative freedom the model could use to answer questions.
I 100% agree here. The first week or two, Gemini 3.0 seemed leaps and bounds above 2.5. It was able to maintain long conversations, talk intelligently and recall items from much early in the conversation and I felt was a huge upgrade over 2.5. Unfortunately over the last week or so I feel like the bottom has dropped out and I can’t trust or use Gemini for any long term projects. Some items I’m seeing.
Chats will completely lose context and seems to completely forget earlier portions of the conversation relatively quickly. You can scroll back, see the prompt/response in question and Gemini will claim ignorance of the information. (See this thread, this is exactly what I’m seeing.)
Chats will randomly prune information very early on in the conversation from the context window, and even though you can still see the prompt/response in myactivity.google.com, if you scroll back up in the chat, they are completely gone. If you ask Gemini about the earlier prompts, it claims it doesn’t know what you’re talking about, it’s not in it’s context window.
30-40% of the time, if you add attachments and ask for some sort of analysis of that document/image/etc., the model reports back, but analyzes a previous attachment from earlier in the conversation. When you try to correct it, “You’re analyzing the wrong image, please look at the last attachment from my previous prompt.” half the time it will analyze a different attachment from earlier in the conversation.
In general, the model just seems ‘confused’ more often than not. Responding with answers that have no relationship to what you asked. As an example, I recently asked it to help me dial in some settings on the XSplit Vcam software, and it kept trying to tell me what settings to change in the XSplit Broadcaster software (which are 2 different apps).
I’m seeing this both in Fast and Thinking modes. I honestly loved 3.0 when it first came out and thought “This is it…this will be my tool going forward…” But it sounds like there was some large update around 12/4, and I really think something got majorly borked on the backend because it’s been a mess since then for me. Unfortunately, I don’t seem to see any way to roll back to 2.5, which would at least be better than what I’m getting from 3.0, and now since I can’t trust using the tool for anything but the most basic tasks and I’m actively looking at other models.
Were you able to find anything reliable? I’ve started to hit a problem this week where what used to be routine files well within the context window are now not being fully “grabbed”, with notices that my file size is too large and the code is being truncated, preventing proper review. Very annoying since when Gemini 3 dropped it was a fantastic improvement in every way, and now it’s essentially unusable for anything I would want to actually spend money for.
That’s a big frustration. It’s definitely advertised to be able to handle things like 50,000+ lines of code, and as recently as last week seemed to be doing remarkable. This week, I’ve had serious issues with it truncating files I’ve uploaded and not scanning entire code. By ‘chunking’ a file into several pieces I was sort of able to get around this, but it seems like it’s coherence falls apart much quicker now. I don’t know what changed, but if this is how it’s going to be I’m not sure I want to be paying for it any more. It’s a real step back for what I was using it for.
Hello, even after pro came out, this bug still persists. It has made gemini as a product unusable for my specific daily needs. That’s unfortunate as before the dec 4 update, gemini 3 was perfect. I really hope that htis’ll be fixed soon.
Thanks for the response and for looking into it. If it helps, here’s a screenshot of what I’m running into.
The entire file is something like 1,195 kb, which online estimators put at somewhere between 250k - 300k tokens.
If I estimate the tokens for the block where it stops reading the file, it comes out to just about 50,000 tokens.
Is it possible there’s some kind of 50k token input limit when reading text files? Only thing that comes to mind.
I would note prior to about a week and a half ago, Gemini was handling all this with ease, and Gemini 3 was a significant improvement over 2.5 (which was already pretty good). So this feels like a major step backwards.
I have to add that I’m noticing the same with Gemini 3, Confirming what other people have said in this thread.
It’s definitely losing information from the context.
I have cases about asking about something we just talked three prompts ago and the model having no idea what I’m talking about.
And trying to confirm and even writing explicitly part of the prompt that is supposed to be in the context because we talked about it just two messages before and it has no idea what I’m talking about.
It completely loses context information.
And that’s actually very serious. Because you actually cannot trust the model at all with this problem.
Yes please look in to it. With 2.5 i could do vary long stories with no problem some stories I work on regularly for months. Then 3.0 will forget details 10 promps in. If I made a story now made it 24 promps of a charectors day a bed to bed story by lunch the ai forgets and make up the early morning events. I had a story 2 5 remember every detail of then 3.0 forgets weeks of work after half a day.
Yeah with Gemini 2.5 Pro i wrote a 90k Novel over 4 Sessions withing 7 days or so.
I could feed 30k words and then continue. there were a few errors but these were minor and with a little reminder my Gemini was able to see and fix them.
Since Gemini 3.0 Pro it feels like an old person with severe brain disorder (altsheimers?) after a few queires… Also hallucinates that i said “Enough Thinking” after some exchanges…When i ask where is aid that, the answer is that i did not…and then Gemini is sorry…
I am Subscribing…this is extra fishy to have performance cut down like that…i understand shit costs money… but its nearly impossiblw to work on bigger projects now…