Gemini 2.5 Pro often not closing thoughts (05-06 does work correctly)

I am using Gemini in a setting where it’s extensively interacting with external tools. Essentially the process I’m looping is:

  1. User sends a message
  2. Gemini thinks, and can choose to execute one or more function calls, alternatively it can send a message (and not call functions)
  3. If functions were called, Gemini receives the results, thinks, and writes a message to the user that synthesizes the results from the function calls

This process is explained to Gemini explicitly in the system prompt and it generally abides by that.

However, in step 3, the thinking stage does not seem to be cleanly delineated. I often receive messages from the API that open with a <thought> tag but never close it (even though I can tell clearly from the content of the message that there is a transition from 1st person thinking mode to a mode where it addresses the user).

I’m using the OpenAI compatible API.

In step 2, I’m making a request like:

                first_response = client.chat.completions.create(
                    model="gemini-2.5-pro",
                    messages=cast(List[ChatCompletionMessageParam], messages_to_send),
                    tools=cast(List[ChatCompletionToolParam], TOOLS),
                    extra_body={
                        "extra_body": {
                            "google": {"thinking_config": {"include_thoughts": True}}
                        }
                    },
                )

and in step 3:

                    final_response = client.chat.completions.create(
                        model="gemini-2.5-pro",
                        messages=cast(
                            List[ChatCompletionMessageParam], messages_to_send
                        ),
                        tools=[],
                        extra_body={
                            "extra_body": {
                                "google": {
                                    "thinking_config": {"include_thoughts": True}
                                }
                            }
                        },
                    )

Relevant prequel:

from openai import APIConnectionError, OpenAI
from openai.types.chat import (
    ChatCompletionMessageParam,
    ChatCompletionToolParam,
)

client = OpenAI(
    api_key=api_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

The 05-06 model does not have these same issues so I am quite concerned about it being deprecated, because this is causing real issues for my users’ experience.

2 Likes

It might be worth noting that the thought leakage still seems to occur even if I don’t set "include_thoughts": True in the "thinking_config" for step 3

2 Likes

Hey @Sam_Van_Herwaarden , Thanks for flagging this. I was able to reproduce the issue and encountered the same problem you described. I’m escalating it to the engineering team for further analysis.

1 Like

Thanks that’s a relief! I thought I was going crazy.

I’m really hoping that 05-06 can stay available until this is fixed, because for now, for me the switch to the new model is a no-go.

2 Likes

Is there any update on this issue / is there some timeline on which I could expect this to be fixed?

The forced upgrade to the GA model now that 05-06 is deprecated has made this a lot more pressing for me, there is basically currently no Gemini API that I can use that works correctly with my app.

1 Like