Hi there.
According to the following documentation, the Gemini 2.5 series automatically distinguishes between Thinking and Non-Thinking based on the input:
Gemini thinking | Gemini API | Google AI for Developers
Models with thinking capabilities are available in Google AI Studio and through the Gemini API. Thinking is on by default in both the API and AI Studio because the 2.5 series models have the ability to automatically decide when and how much to think based on the prompt.
Also, when I use Gemini 2.5 Flash(gemini-2.5-flash-preview-04-17) and check the usage_metadata.thoughts_token_count
in the response, I can see values ranging from around 2,000 to 5,000(When I set the thinking_budget
to 0, I checked that the value drops to 0 and the model becomes noticeably dumber)
However, in the Google Cloud billing report, all outputs are listed as “Generate content output token count gemini 2.5 flash short output text non-thinking”.
Why is this happening? Has it just not been billed as “thinking” yet, even though it’s been three days?
from google import genai
from google.genai import types
class Test:
def __init__(self, gemini_api_key: str):
self.API_KEY = gemini_api_key
self.client = genai.Client(api_key=gemini_api_key)
def talk(self, message):
r = self.client.models.generate_content(
model="gemini-2.5-flash-preview-04-17",
contents=message,
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=10000)
),
)