Thinking with multi-turn conversation

Background

I’m experimenting with Gemini Pro 2.5 to enhance complex, multi-stage coding tasks—entirely automated, with no human in the loop. In these tasks, I expose the model to a wide range of tools, and the interaction typically looks like this:
user->tool-call->tool-response->tool-call->tool-response->tool-call->…

issue

To make the model’s reasoning process more transparent, I’ve enabled the include_thoughts flag. However, at each stage of the task, when I inspect the thoughts summary, it appears as though the model is re-analyzing the problem from scratch. It doesn’t seem to reference its previous thinking or build on it explicitly.

example:

Let’s say I ask the model to translate a file from language A to language B.

Here’s a shortened version of the model’s initial thought summary before calling the first tool:

Okay, here's the breakdown of my thought process, taking on the persona of an expert in the field, ready to tackle this translation task.

Alright, let's dive in. My goal is to find excellent examples within the target repository that will help me translate this XXX...

First things first, I need a solid understanding of the *source* file. Okay, it's an Angular directive, which means it's responsible for rendering part of the UI based on data. It has a template, handles a scope with ...

I'll also check the file XXX to see how they handled the component.

After the tool response is returned with function call, and I am appending also the tool-response - When I am calling the LLM again, I get the next thought summary, which seems to restart the reasoning process instead of continuing from the previous step:

Okay, here's the breakdown of my thought process as I tackle this translation task. I've start by digging into the Angular code, looking at XXX. It's pretty straightforward – it's a directive that renders XXX which it then hands off to another directive YYY.  

My initial thought is that XXX is a wrapper. It calls the renderer directive. 

I started by looking at ZZZ figuring that's a good name for a file that would use it, but unfortunately, it doesn't seem to exist.  So, I'm thinking, maybe the React translation has combined the logic. 

I then looked at the file itself. I'm seeing that it handles a lot of logic, particularly based on the sub-question type.

Now, based on the returned value I need to find out *what* uses ZZZ.  My gut feeling is that it's likely a component that handles displaying a page...

And again in the third turn:

Okay, let's get to work. My task is to find the React equivalent of an Angular directive, specifically XXX, within the target repository. 
...

While the model does make progress and the task does converge, the internal thoughts summaries feel disconnected—as if each one is starting fresh rather than continuing a coherent line of thought.

Question

  1. Are the thoughts in multi-turn conversations discarded between turns?
  2. Is there a way to instruct the model to reference or build upon prior thoughts instead of rethinking from first principles each time?

LLMs (and Gemini is no exception) are entirely stateless. They never remember anything and need to be re-fed it on every single invocation. They don’t create a chain of thought once and then work through it - they are trained and instructed to “think out loud” and use this to control the output. Gemini provides a summary of the CoT from that one single output. The moment you send another request, this starts over from scratch, hence why the summary will be different each time.

LLM memory is an illusion. They don’t retain anything, including thoughts, because to them, that’s just a different part of the singular output.

thanks for the reply. but you got me wrong.
In every llm call I am giving all the msg history.
So the content for the 3rd llm call is something like:

response = create(
  model="gemini-pro-2.5,
  content=[
    <user: main prompt>,
    <assistant: reasoning + tool_call>
    <tool response>
    <assistant: reasoning + tool_call>
    <tool response>
  ])

the previous stage’s reasoning is there in the message list

You’re not supposed to send the thought summaries back. The user/model response flow will be enough for it to understand how far through its CoT it is.

I’m doubt about this. I’m sending the assistant message object (from the LLM response) back as is. I don’t think Google expects me to manually exclude the ‘thought’ part from the object.
What I am not sure, is whether the thoughts are taken into account internally as a context.

Doest anyone have experience with thinking summaries and investigated the thoughts content in multi-turn conversation?

The CoT is just for generating better responses in the moment. They are not meant to be sent back. Think of them as being for your debugging. Some apps will surface parts of them to show what is being “thought” about, but the chat history is meant to be the conversational/tool history only, and the thoughts aren’t part of that. This is why they come through as their own unique Part object in the candidates array, rather than a property on a response or FunctionCall entry.

If you really need them, you could take the thoughts and summarize them. Boil them down into something compact and tiny, and include that in the response. Flash-Lite could do this in a heartbeat. This would avoid polluting the context, as the more thoughts that are included, the more the model may inappropriately pattern-match on them.

FYI, in o3 from openAI, the reasoning indeed regarded as part of the msg list in multi turn tool calling. and is crucial for the success.
see Better performance from reasoning models using the Responses API

cite: " In typical multi-turn conversations, you don’t need to include reasoning items or tokens—the model is trained to produce the best output without them. However, things change when tool use is involved. If a turn includes a function call (which may require an extra round trip outside the API), you do need to include the reasoning items—either via previous_response_id or by explicitly adding the reasoning item to input ."

Thanks for sharing the link. You are right that in OpenAI Responses API is stateful, meaning it retains conversational context and reasoning across multiple tool calls

Right now, the Gemini API is stateless, so each tool call is handled on its own. That is why it seem like the model is starting over each time.

We will pass this along as a feature request, keeping reasoning across steps could really help in more advanced workflows.

Thanks

2 Likes