When using reasoning models (gemini-2.0-flash-thinking-exp-1219) in OpenAI compatibility mode, the reasoning tokens are combined with the completion tokens without any sort of delimiter.
It’s a great improvement over OpenAI’s own implementation that Gemini gives access to the reasoning tokens, but it would be helpful to separate them from the output with some sort of delimiter (e.g. using <thinking> tags) to make it easier to parse the response.
I know this is supported in the first-party Gemini SDK, by checking part.thought, but the Gemini SDKs are incredibly confusing (there’s like 3 different SDKs, can’t find documentation on what each one is for, and they’re all frustrating to work with compared to the OpenAI API).
I saw this post and was curious to test it myself, and I can confirm the issue.
I know this is supported in the first-party Gemini SDK, by checking part.thought , but the Gemini SDKs are incredibly confusing (there’s like 3 different SDKs, can’t find documentation on what each one is for, and they’re all frustrating to work with compared to the OpenAI API).
In the current version of the SDK, part.thought doesn’t seem to be getting filled in… there are just two separate parts, and the first one is the thinking process. The code was very short:
import google.genai as genai
client = genai.Client(api_key="API_KEY")
response = client.models.generate_content(
model='gemini-2.0-flash-thinkingexp',
contents='Tell me a programming joke.'
)
for candidate in response.candidates:
for part in candidate.content.parts:
if part.text:
print(part.text)
print("-" * 80) # Separator line
Probably even shorter than the OpenAI SDK.
It is unfortunate that AI Studio shows one Python SDK, but recommends another (without providing the code).