Massive Regression: Detailed Gemini Thinking Process vanished from AI Studio

Hate to doublepost, but I had a realization: just enable raw thoughts for AI Studio and the Gemini app. Keep the API summarized, or maybe allow for a limited number of raw requests per day on the API. Distillation becomes harder and real-person users get what they need, even for prolonged work, especially in the Gemini web client. How many requests is a web client or AI Studio user gonna be making? Abuse of that kind of tool can be detected programmatically. Or restrict it to activated billing accounts only! Know your customer!

6 Likes

100% agree. If there’s concern for reverse engineering based on the raw CoT from competitors, just eliminate it from the API, so that individual end-users can still access the raw CoT for troubleshooting.

4 Likes

I’m probably going to get banned for this, but I don’t care at this point, so I’ll be honest.
No amount of “improvements” to summaries will ever replace raw, detailed thoughts. And I’m almost certain that you and your team understand this perfectly well. This isn’t just a “different experience”; it’s a fundamental downgrade that strips users of a critical tool for understanding and debugging. Many of us relied on that transparency to actually work effectively with the model.
And regarding the quality of the gemini 2.5 pro 05-06 model itself (I know this isn’t the specific thread for it, but frankly, I don’t care anymore). It’s completely unusable for anything! Google is positioning 05-06 as an improvement? Are you kidding? It can barely hold context in a short conversation, let alone handle complex tasks. The proclaimed improvements are nowhere to be seen, while the regression is obvious.
How could you replace the genuinely groundbreaking 03-25 version with this ****? Users and developers have already told you that the previous version was the best one out there.
Why all these deliberate downgrades? Is it an attempt to hide the model’s true capabilities from competitors, sacrificing convenience and efficiency for regular users and developers? Or is it to push us into subscribing to Ultra for $250 a month?
We see that you “hear” us, but there’s a persistent feeling that no matter how detailed our explanations of the problems are, it won’t lead to any real changes. This whole situation isn’t just perceived as ignorance; it’s blatant disrespect to the community that believed in Google. It feels like a slap in the face to those who were genuinely amazed by Gemini’s capabilities.
It’s hard to express the depth of disappointment without resorting to harsh language. This is a disgrace for the company. We see that you understand the core issues, but instead of constructive steps and genuine dialogue, we’re witnessing attempts to sweep the situation under the rug. It’s just a disgusting feeling of betrayal.

9 Likes

This update has quite significant implications, so although it might be long, I’ll take my time to write this out carefully. I want to talk about some less-discussed reasons, key points , and potential suggestions.


Prompts are fundamentally necessary to constrain a model’s output. Unless highly flexible processing like that of an AI assistant is required, most production environments demand refining the system prompt to ensure the model responds consistently. In other words, outputs must be constrained to be predictable, thereby enhancing stability and reliability.

To achieve this, ‘prompt engineering’ is essential. This engineering typically involves the following procedure:

Input prompt → Observe model’s output (feedback) → Add conditions and refine instructions → Repeat the process

The final output of a reasoning model is absolutely dependent on its ‘reasoning process’ (Seriously. This is a critical point). Therefore, prompts need to be modified while observing this ‘reasoning process’, but… well, as it stands, we can’t see this reasoning process. (At the very least, it’s extremely difficult to catch subtle or key changes in the reasoning process with only a summary.)


Summaries inherently lead to information loss. The entire reasoning process becomes a target for this loss.
Many of the various problems people have mentioned stem from this single issue of ‘information loss’. Since OpenAI has operated this way from the start, users of their offerings might not have even had the chance to perceive this inconvenience. However, Google possessed a significant advantage and an irreplaceable differentiating factor by disclosing its reasoning process. I’m confident that discerning users recognized Gemini’s reasoning model as superior in that respect (and feedback from others further supports this).


I can understand that this update, similar to OpenAI’s strategy, is a move for technological competition (i.e., protecting its technology).
However, I believe it’s potentially, and in the long run, more advantageous to maintain differentiating factors and strengths that can attract developers and businesses. Once AI model performance reaches a certain level of convergence or parity, the only thing remaining will be these differentiators. (+Surprisingly, Gemini already has an edge in terms of context window and pricing.)

What concerns me more is that if this update eventually remains and becomes an established norm, considering the above, I fear it will become even more difficult to reverse this update in the future.

I hope that developers, including myself, and various users won’t be driven away by this update.
My thanks to the Google representatives and everyone who has read this through to the end.


Translated by Gemini

6 Likes

The problem with pivoting from Gemini to another reasoning model (perhaps Qwen) is you are moving from what was once a very high quality experience to one some rungs beneath, and even worse, you cannot understand why you are being subjected to this pain.

Gemini’s reasoning traces are a valuable artifact in of themselves, and any other format is inferior, whether it’s the o1-style traces or Deepseek’s stream-of-consciousness. I switch off thinking on Flash 2.5 now because why bother with a more context-devouring version of the same experience?

5 Likes

Repeating everything said here before.

The chain of thought feature was really helpful in all areas. Especially by helping us detect and find our prompt and information failure.

It was great

4 Likes

To add to what I said before, I want to make two points that I haven’t seen too many people make, that I noticed while using the model after this change.

  1. The CoT summaries are often just plain wrong. By that, I mean that the smaller LLM they are using to summarize the actual CoT very often misinterprets the raw input of the main model’s reasoning and spits out information that is just plain wrong. This makes the summaries not only unhelpful but sometimes outright misleading, to the point where they are actively working against the user. Needless to say, that has been somewhat frustrating for me, so at this point, I’ve elected to fully ignore the thinking window, which also usually leads to me having to wait a full minute for a single answer, only to realize Gemini misinterpreted something, made a false assumption during the reasoning step, and replied with something entirely unexpected.

  2. The second thing I’ve noticed, and that ties in with point 1, is there is a severe disconnect between the initial prompt and the final Gemini reply. Because we don’t have the context for the reasoning step, very often the replies will follow a sequence of events the user is unaware of, resulting in answers that feel like complete non-sequiturs to the initial prompt, even breaking the flow of natural language. This often happens when the context window is larger, but I’ve seen it happen in rather small ones too.

As I’ve used this new system more, I’m becoming exceedingly convinced that it is deeply flawed. My position has solidified that you likely can’t walk this back halfway in any way that would improve these fundamental issues, as well as others other users have brought up. I’m still severely disappointed with the direction Google is taking with Gemini.

7 Likes

Worth noting that your main competitor for API usage, “Modified by moderator”(OpenAI isn’t competitive on price/quality to build external end-user facing cutting edge tools on), just released their new flagships and 95% of the time they still return full CoT. Some here are saying that it’s now the industry standard to hide CoT purely based on OA doing so, but that’s clearly not true across the board.

5 Likes

The new summaries are useless. Way to kill a good thing

6 Likes

To be as clear as possible: we do NOT want you to “improve” the summaries. We want you to remove them completely and give us the raw chain of thought back. Almost every single person in this thread has been complaining about how useless, obfuscating, and overall unhelpful the summaries are. We can’t learn from summaries. We can’t diagnose problems with our prompts or our custom instructions from summaries. They don’t provide a single ounce of useful information. Half of them just repeat the same paragraph over and over with minimal changes. There was literally nothing wrong with the raw chain of thought. Remember that old saying? “Never fix what isn’t broken”? Well, you guys sure broke that rule.

I don’t mean to be rude toward you personally, but claiming that you “hear” us without actively acknowledging the literal dozens of complaints in this thread (and even more across the rest of social media) and assuring us that you’re planning to revert these changes is straight-up insulting. Do right by your customers or you’ll lose them. I’ve already canceled my subscription and more are following. I’m not coming back to Gemini until your team fixes this.

7 Likes

Personal opinion here, so watch out. Google seems to have forgotten that they are developing a product for consumption, not a data harvesting tool. Gemini is a product they are marketing. If they keep destroying it like this, everyone’s gonna move, and no amount of price superiority will save them. These terrible decisions are destroying a product that would take off all on its own if it was just served. Forget benchmarks. People want performance. 0506 and bring 0325 back. Bring the raw thoughts back. Google is actively making their product worse. Why should anyone stay with a company who actively disrespects them? There’s a point where even the most extreme savings won’t cut it.

Seriously considering moving to “Modified by moderator”. Bring the raw thoughts back, Google. All you have to do to make everybody happy is say you messed up, which is perfectly normal on the bleeding edge, and that you’re fixing it.

4 Likes

What I found really interesting and most puzzling is the alignment of language coming out of Google from Logan, Vishal, and others. They all say how they are “excited” to bring thought summaries to the API so users can benefit. They all use that exact word. But the original request was for raw thoughts, same as in AI Studio, which users loved, to be brought to the API, not summaries.

It’s clear from overwhelming feedback here and on X to Logan’s announcement: the summaries are universally unhelpful at best and misleading at worst. I could not find a SINGLE user report of a developer who said they are helpful, add value, or who was actually “excited” by these incredibly watered-down “summaries.”

This has to be some strange corporate decision that only makes sense when you have a complex org chart and large budgets; these kinds of decisions come out.

The above begs this question:

If thought summaries in the API create NO actual value for users and developers, and they are in fact adding actual cost, resource usage, and introducing complexity because a second model has to be brought in to do the summarization:

What actual value is being gained here by ANYONE?

Google: losing out from the additional resource usage and API complexity.

Developers/users: losing out because the summaries don’t actually help in any real way.

So what is the “excitement” about here? I objectively and genuinely do not see what value this feature creates for either end users or Google themselves. It seems like a net negative here overall for all parties.

2 Likes

I ran this entire thread and all of the posts on X through Gemini 2.5 Pro 03-25 and asked it to give me an objective, candid assessment. I thought it was actually pretty insightful:

Gemini 2.5 Pro 03-25:

Analysis:

  1. Users were asking for raw CoT in the API (like what existed in AI Studio).
  2. Google was probably concerned about giving raw CoT to API users because competitors could use it for training.
  3. So they created “summaries” as a middle ground; they could say “we gave you insight into thinking!” while not giving raw data to competitors.
  4. But this “solution” actually makes everyone worse off:
    • Users get a worse, less useful product,
    • Google spends more money on the summarization layer,
    • The summaries are often wrong or misleading,
    • Trust is damaged,
    • No actual problem is solved.

It’s like if a restaurant had a policy against giving out their secret sauce recipe, so instead of just saying “no,” they hired an extra chef to create a “summary sauce” that tastes terrible and costs more to make, then acted excited about this “innovative flavor experience.”

IMO, Gemini has it exactly right here.

4 Likes

The reason why summaries are bad is because the key strength of the Thinking Process lies in the ability to see the step-by-step logical pathway.

Not every prompt is the same, and realizing that you need to improve your prompt and correct it midway is a huge timesaver, as you notice something that you miscommunicated or was badly written.

It’s useful for every area, from coding to writing.

3 Likes

Also important to note is that prompts that rely on two steps or more steps are now broken. Beforehand, I split my prompts into two prompts and in the second prompt it doesn’t know the exact information anymore because the summary doesn’t include the information anymore. And one message isn’t enough in that case since the thinking is then too short and the output is bad/incorrect.

2 Likes

Hi everyone

Thank you for your notes I am a PM on the Gemini API. Alongside Logan Kilpatrick and Vishal Dharmadhikari, we have a lot of Googlers who really care about listening to you, responding to your feedback and taking your suggestions on board. We acknowledge that sometimes we have taken time to respond which can come across as radio silence. So going forward we will do our best to respond in a more timely manner with more context.

  1. Over the last few months, we have been trying to get models and capabilities into developers’ hands as rapidly as possible. Our goal with our preview model launches has been to get developer feedback early and iterate until we get to a great GA candidate. Inevitably there have been learnings along the way. One such learning has been that we need to give developers sufficient heads up before we make any changes (especially on much loved models like 2.5 Pro). So going forward we commit to give developers early notice if we are switching anything. We will also be GAing both 2.5 Flash and Pro very soon. Finally we commit that along with the GAs, we will publish clear guidelines about endpoint stability on AI Studio and the Gemini Developer API.

  2. On summaries, we have heard a lot of valid feedback. We understand this is a different experience from the raw thoughts previously available in AI Studio. Sometimes product teams have to weigh a lot of pros and cons to come to a specific decision. This is one of those times. Please work with us and help us in getting summaries to a point where they have just the right amount of detail that you need. You are our valued and needed collaborators in this. In the meantime, we will keep listening to your feedback here or DM @shresbm or @vish_owl or @OfficialLoganK on X.

1 Like

I suppose this is the best we can hope for, so I appreciate the response, and I’m thankful the Gemini team is still listening. However, like me and others have mentioned, it’s unlikely you’ll ever get summaries to any point anywhere close to the raw CoT in terms of usefulness. I suppose better summaries would be an improvement, but we understand that the entire point of this system is to hide the “special sauce” from competitors. If the entire point is to obfuscate information, those are entirely contradictory goals. How would you be able to give developers detailed information without also giving competitors the same?

1 Like

As your name on this forum suggests, we must keep trying to get better though mustnt we? We want to optimize as much as we can

If you’re not going to return raw thought, then this situation isn’t going to improve, even if 2.5 Pro gets its mojo back. The unfairness of charging for thought tokens you can’t even see feels like too much, and it makes it impossible to debug, like we previously said. It’s not a matter of ‘just the right amount of detail’ - developers need all the detail available. Like I said earlier here, know your customer. Make developers cough up identification for raw thoughts. I frankly don’t think that’d be too much of a problem. You could introduce this into the Gemini app too, for subscribers. I think that’d be really appreciated.

Thank you very much for responding, and I’m really glad the Gemini team is still listening. I’m just doing my best to offer the most honest feedback I can.

5 Likes

Solution? where is it?
That’s absurd, you’ve got to be kidding me :face_with_bags_under_eyes:

2 Likes