Pricing for Gemini 2.5 API: With and Without Thinking Option in the Official Release

In the preview version of Gemini 2.5, I understand that the output token pricing differed depending on whether thinking-option use was enabled or not.

However, in the pricing table for the official release of Gemini 2.5, there is no separate mention of pricing for thinking-option, and it appears that both cases are priced at $2.50 per 1 million output tokens.

In the official release of Gemini 2.5, if I choose the “no thinking” option, will I still be charged $2.50? Or is there a lower price that simply isn’t explicitly stated in the documentation? If so, what is the specific rate for usage without thinking-opion use?

Thank you.

Hi @mtc, In the documentation it was mentioned as Output price (including thinking tokens) was $2.50 for 1M tokens. As per my understanding if you disable the thinking then the output tokens will be less because the thoughts will not be generated so the price will also get reduced. Thank You.

1 Like

It’s slightly misleading, but there is no difference in cost between output with or without thinking. You’re paying for the output tokens. If you disable thinking (in Flash only, you can’t in Pro) then those tokens are counted as output and you pay for them like any other output token.

What is important to understand is that the thought summaries that you get sent is NOT what you were charged for! You were charged for the full chain of thought. The summary is simply a summary of that. You get the total tokens in the API result, every time, so you can see exactly how many were used.

Hi @Kiran_Sai_Ramineni, thank you for your response!

Yes, that makes sense — disabling the “thinking” option should reduce the total number of output tokens, so the overall cost would naturally go down as well.

Just to clarify, what I was trying to ask (and I hope this didn’t come across too blunt — English isn’t my first language!) is whether the per-token rate itself changes depending on whether “thinking” is enabled or not. From the current documentation, it seems like the rate remains fixed at $2.50 per million output tokens, and the only difference is in the token volume. But I was wondering if there’s any chance that a different internal rate applies behind the scenes when “thinking” is turned off — since that distinction existed in the preview version.

Really appreciate your help and insights — thanks again!

@Richard_Davey, Thanks — that helps clarify things.

From your explanation, I understand that in the official release of gemini-2.5-flash, the per-token cost is the same whether or not thinking is enabled.

If I’m aiming for something closer to the gemini-2.0-flash pricing, I’ll probably take a look at the preview version of gemini-2.5-flash-lite.

Also, I’ll make sure to rely on the actual token usage reported in the API, rather than just the visible thought summaries, when reviewing costs.

Appreciate the insight.

Correct, the output token cost is fixed, regardless of whether that was a chain of thought token or a response text token. To get the costs down, you can either disable thinking (which is fine if your prompt doesn’t really need it), or look at using Flash Lite but again, disable thinking here too.

For our use-case (code generation) thinking is invaluable. But in lots of cases it doesn’t make any marked difference. ymmv!

1 Like