Reasoning tokens combined with completion tokens in OpenAI compatibility mode

Sierra · December 31, 2024, 7:36pm

When using reasoning models (gemini-2.0-flash-thinking-exp-1219) in OpenAI compatibility mode, the reasoning tokens are combined with the completion tokens without any sort of delimiter.

It’s a great improvement over OpenAI’s own implementation that Gemini gives access to the reasoning tokens, but it would be helpful to separate them from the output with some sort of delimiter (e.g. using <thinking> tags) to make it easier to parse the response.

I know this is supported in the first-party Gemini SDK, by checking part.thought, but the Gemini SDKs are incredibly confusing (there’s like 3 different SDKs, can’t find documentation on what each one is for, and they’re all frustrating to work with compared to the OpenAI API).

Jitendra_kumar_Mahen · January 4, 2025, 6:29pm

Wonderful full capture Gemini api identify that targets let’s top specific for counting the occurrence

coder543 · January 6, 2025, 4:14pm

I saw this post and was curious to test it myself, and I can confirm the issue.

I know this is supported in the first-party Gemini SDK, by checking part.thought , but the Gemini SDKs are incredibly confusing (there’s like 3 different SDKs, can’t find documentation on what each one is for, and they’re all frustrating to work with compared to the OpenAI API).

In the current version of the SDK, part.thought doesn’t seem to be getting filled in… there are just two separate parts, and the first one is the thinking process. The code was very short:

import google.genai as genai

client = genai.Client(api_key="API_KEY")

response = client.models.generate_content(
    model='gemini-2.0-flash-thinkingexp',
    contents='Tell me a programming joke.'
)

for candidate in response.candidates:
    for part in candidate.content.parts:
        if part.text:
            print(part.text)
            print("-" * 80)  # Separator line

Probably even shorter than the OpenAI SDK.

It is unfortunate that AI Studio shows one Python SDK, but recommends another (without providing the code).

Sierra · January 21, 2025, 6:24am

The new DeepSeek R1 model has a really nice way of dealing with this in their version of OpenAI compatibility.

It would be awesome if the reasoning Gemini models could implement something similar in their OpenAI compatibility version.

reasoning_content：The content of the CoT, which is at the same level as content in the output structure.

Source: https://api-docs.deepseek.com/guides/reasoning_model

gb_sheet · January 23, 2025, 11:56am

Sierra:

When using reasoning models (gemini-2.0-flash-thinking-exp-1219) in OpenAI compatibility mode, the reasoning tokens are combined with the completion tokens without any sort of delimiter.

It’s a great improvement over OpenAI’s own implementation that Gemini gives access to the reasoning tokens, but it would be helpful to separate them from the output with some sort of delimiter (e.g. using <thinking> tags) to make it easier to parse the response.

I know this is supported in the first-party Gemini SDK, by checking part.thought, but the Gemini SDKs are incredibly confusing (there’s like 3 different SDKs, can’t find documentation on what each one is for, and they’re all frustrating to work with compared to the OpenAI API).

Screenshot 2024-12-31 at 11.27.24 AM1812×1114 181 KB

a great capture identify by gimini

Kostiantyn · February 17, 2025, 4:30am

Same here. Hopefully, this will be resolved. One quick solution could be to hide the reasoning tokens once and for all in the OpenAI SDK (if possible, of course).

gb_sheet · April 18, 2025, 12:46pm

Totally agree — having a clear delimiter like <thinking> would make parsing so much easier when using Gemini in OpenAI-compatible mode. Access to reasoning tokens is great, but without structure, it’s messy. Hopefully they add an option to toggle structured output soon!

Topic		Replies	Views
How can I get Thinking Content from Gemini-2.5-pro or flash with OpenAI SDK Gemini API gemini-flash , open-ai	1	169	July 9, 2025
Has the API changed in gemini-2.0-flash-thinking-exp-01-21 Google AI Studio api , gemini-flash	5	965	May 21, 2025
Gemini 2.5 Pro often not closing thoughts (05-06 does work correctly) Gemini API function-calling , gemini-2-5	5	391	November 26, 2025
Gemini 2.5 Flash Thinking Tokens using OpenAI API Gemini API help_request	16	1493	June 12, 2025
Thoughts are missing (CoT not included anymore) Gemini API gemini-20	12	3019	February 11, 2025

Reasoning tokens combined with completion tokens in OpenAI compatibility mode

Related topics