I am using Gemini in a setting where it’s extensively interacting with external tools. Essentially the process I’m looping is:
- User sends a message
- Gemini thinks, and can choose to execute one or more function calls, alternatively it can send a message (and not call functions)
- If functions were called, Gemini receives the results, thinks, and writes a message to the user that synthesizes the results from the function calls
This process is explained to Gemini explicitly in the system prompt and it generally abides by that.
However, in step 3, the thinking stage does not seem to be cleanly delineated. I often receive messages from the API that open with a <thought>
tag but never close it (even though I can tell clearly from the content of the message that there is a transition from 1st person thinking mode to a mode where it addresses the user).
I’m using the OpenAI compatible API.
In step 2, I’m making a request like:
first_response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=cast(List[ChatCompletionMessageParam], messages_to_send),
tools=cast(List[ChatCompletionToolParam], TOOLS),
extra_body={
"extra_body": {
"google": {"thinking_config": {"include_thoughts": True}}
}
},
)
and in step 3:
final_response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=cast(
List[ChatCompletionMessageParam], messages_to_send
),
tools=[],
extra_body={
"extra_body": {
"google": {
"thinking_config": {"include_thoughts": True}
}
}
},
)
Relevant prequel:
from openai import APIConnectionError, OpenAI
from openai.types.chat import (
ChatCompletionMessageParam,
ChatCompletionToolParam,
)
client = OpenAI(
api_key=api_key,
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)
The 05-06 model does not have these same issues so I am quite concerned about it being deprecated, because this is causing real issues for my users’ experience.