Thinking Tokens Counted, but Billed as Non-Thinking

Hi there.

According to the following documentation, the Gemini 2.5 series automatically distinguishes between Thinking and Non-Thinking based on the input:

Gemini thinking  |  Gemini API  |  Google AI for Developers

Models with thinking capabilities are available in Google AI Studio and through the Gemini API. Thinking is on by default in both the API and AI Studio because the 2.5 series models have the ability to automatically decide when and how much to think based on the prompt.

Also, when I use Gemini 2.5 Flash(gemini-2.5-flash-preview-04-17) and check the usage_metadata.thoughts_token_count in the response, I can see values ranging from around 2,000 to 5,000(When I set the thinking_budget to 0, I checked that the value drops to 0 and the model becomes noticeably dumber)

However, in the Google Cloud billing report, all outputs are listed as “Generate content output token count gemini 2.5 flash short output text non-thinking”.

Why is this happening? Has it just not been billed as “thinking” yet, even though it’s been three days?

sot

from google import genai
from google.genai import types

class Test:
    def __init__(self, gemini_api_key: str):
        self.API_KEY = gemini_api_key
        self.client = genai.Client(api_key=gemini_api_key)

    def talk(self, message):
        r = self.client.models.generate_content(
            model="gemini-2.5-flash-preview-04-17",
            contents=message,
            config=types.GenerateContentConfig(
                thinking_config=types.ThinkingConfig(thinking_budget=10000)
            ),
        )
1 Like

Solved: “Generate content output token count gemini 2.5 flash short input text” was added to Billing, quite late.