Gemini 2.5 Flash Thinking Tokens using OpenAI API

vakdeagle · April 17, 2025, 8:36pm

Hi,

Is it possible to enable/disable thinking or set thinking_tokens for Gemini 2.5 Flash using the OpenAI compatible API?

Thanks

afon · April 18, 2025, 4:50am

I’m also interested in this question regarding controlling thinking with the OpenAI compatible API.

terobox · April 18, 2025, 7:29am

waitting for updates

Baptiste_C · April 18, 2025, 11:33am

It would be preferable for the model name to have two endpoints: one with Thinking and one without Thinking, which would make integration easier. A bit like what OpenRouter currently offers.

terobox · April 18, 2025, 12:24pm

or like claude, add extra info, more control

Facadoooo · April 18, 2025, 5:09pm

Maybe even like the ones in Requesty? Where you can set one of four thinking efforts:

Sina_Azizi · April 19, 2025, 3:00am

Same here. I would like to see OpenAI compatibility with Thinking Tokens and Reasoning Trace.

Axis · April 19, 2025, 11:00am

Same as well, I want to use and disable thinking mode.

AG_AssetPlan · April 21, 2025, 1:02pm

Official documentation tell to set thinking budget to 0 if you want to disable thinking

vakdeagle · April 21, 2025, 1:13pm

For Gemini API, yes, but not OpenAI

Jouidah · April 21, 2025, 9:41pm

Got one better:

find a way to make the AI agent think less and generate more

i am using a checkpoint approach, and currently building a large scope project without even typing a single line of code

so far we are having a slow but stable process regarding the whole app, but one of the reasons is I AM USING A PHONE TO CODE AND HOST THE PROJECT

my last test run using my method resulted in :

130+k token on load

thought for 13s

generated for 200+s

and the generated content was very precise
i need some help making AITHER using this method
and i have plans for enhancing the current method

if you need proof, please tell me how to share the data, this is my first login here

Stefan_Streichsbier · April 22, 2025, 1:17am

There is an update from Logan. It’s not yet possible, but the team is working on it.
“Modified by moderator”

Dor_Alboim · April 24, 2025, 9:31am

@Stefan_Streichsbier
Is there a time estimation for when it’s going to be supported?

thank you!

Sina_Azizi · April 27, 2025, 3:58am

OpenAI compatibility for Gemini Flash is now available. You just have to set reasoning_effort to none, low, medium, high.

Shilad_Sen · April 28, 2025, 1:49pm

This is great news! Flash 2.5 is a super promising model for us. However, about 25% of the time we see very long latencies for simple requests (e.g. 5s-7s) that are inconsistent with Flash 2.0 and other Flash 2.5 calls (which take roughly 1s). I wonder if the reasoning_effort is sometimes ignored as described in this bug? “Modified by moderator”

Artificialanalysis.ai shows similarly high latencies (Gemini 2.5 Flash: API Provider Performance Benchmarking & Price Analysis | Artificial Analysis), and I imagine this is not the expected latency for the model. This makes not usable for agentic work.

Shilad_Sen · May 2, 2025, 1:23am

Update: Here’s a minimal example that reproduces the behavior. It’s a little tricky to get it to consistently reason, but this looks to produce reasoning tokens about 30% of the time.

I can actually get it to produce the behavior without structured output; it’s just rarer (maybe 5% of requests).


url = "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions"
api_key = 'XXXX'
msgs = [
    "If you remove every other letter from 'SUBSTANTIATION,' what's the resulting word?",
    "If yesterday's tomorrow is Friday, what day is two days before today's yesterday?",
    "How many unique ways can you arrange the letters in 'MISSISSIPPI'?",
    "Explain briefly why mirrors reverse left-to-right but not up-to-down.",
    "If all roses are flowers and some flowers fade quickly, must some roses fade quickly?",
    "Is it logically possible for an omnipotent being to create a rock it can't lift?",
    "Which weighs more: a pound of feathers on Earth or a pound of iron on the Moon?",
    "If all cats chase some mice and all mice fear all dogs, do all cats fear some dogs?",
    "Does the set of all sets that don't contain themselves contain itself?",
    "If two people each flip a fair coin five times, what's the probability their results match exactly?",
    "Explain in one sentence why multiplying two negative numbers yields a positive result.",
    "If there are three apples and you take two, how many apples do you have?",
    "Can a statement be both completely true and completely false simultaneously?",
    "If you always lie and you say 'I always lie,' are you lying or telling the truth?",
    "If today is Wednesday, what is the day 1000 days from now?",
    "Which is larger: 2^30 or 3^20?",
    "A triangle has angles in a 1:2:3 ratio; what are the three angle measurements?",
    "Can the average height of a population increase even if every individual's height decreases?",
    "Does adding salt to water increase or decrease its freezing point?",
    "Which has a greater perimeter: a square with an area of 16 or a rectangle with an area of 16 and dimensions 1 by 16?"
]

for msg in msgs:
    headers = {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer ' + api_key,
    }
    body = {
        "model": 'gemini-2.5-flash-preview-04-17',
        "reasoning_effort": "none",
        "messages": [
            {
                "role": "system",
                "content": 'You are a helpful assistant. '
                           'Answer the question using reasoning, with careful step-by-step reasoning before producing an answer.'
            },
            {
                "role": "user",
                "content": msg
            }
        ],
        "response_format": {
            "type": 'json_schema',
            "json_schema": {
                'name': 'result',
                'schema': {
                    'type': 'object',
                    'properties': {
                        'explanation': {
                            'type': 'string'
                        },
                        'answer': {
                            'type': 'string'
                        }
                    },
                    'required': ['explanation', 'answer'],
                }
            }
        }
    }
    r = requests.post(url, headers=headers, json=body)
    js = r.json()
    reasoning_tokens = (js['usage']['total_tokens'] -
                        js['usage']['prompt_tokens'] -
                        js['usage']['completion_tokens'])
    print(reasoning_tokens)

Kiran_Sai_Ramineni · June 12, 2025, 6:52am

Hi @vakdeagle, To disable thinking using OpenAI Compatibility you can use the following code

response = client.chat.completions.create(
  model='gemini-2.5-flash-preview-05-20',
  extra_body={
      'extra_body':{
          'google': {
              'thinking_config': {
                  'include_thoughts': False
              }
          }
      }
  },
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {
        "role": "user",
        "content": 'what is the capital of india'
      }
  ]
)

To enable and define a thinking budget you can define thinking config like

 'thinking_config': {
                  'thinking_budget': 800,
                  'include_thoughts': True
              }

Thank You.

Topic		Replies	Views
Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0 Gemini API help_request	5	1032	April 22, 2025
How To disable Thinking using Gemini 2.5 Flash? thinkingBudget: 0 not working Gemini API help_request , gemini-flash	1	932	April 23, 2025
How to Reduce Thought Reasoning in Gemini 2.5 Pro Gemini API api , models	7	1012	June 9, 2025
Why I observed gemini2.5flash (setting thinking_budget=0) is slower than gemini2.0flash? Gemini API prompt	5	362	June 5, 2025
2.5 Flash down recently due to thinking tokens Gemini API help_request , gemini-flash	3	169	May 7, 2025

Gemini 2.5 Flash Thinking Tokens using OpenAI API

Related topics