Here is an example prompt that reproduces this issue:
Task: Translate this text to native fluent English.
Text: Yardım bile ediyor bana küçük hanım.
Generation config:
client.models.generate_content(
contents=[prompt],
model="gemini-2.5-flash-preview-09-2025",
config=GenerateContentConfig(
response_mime_type="application/json",
thinking_config=ThinkingConfig(thinking_budget=0),
),
)
Response usage.metadata
cache_tokens_details=None cached_content_token_count=None candidates_token_count=9 candidates_tokens_details=None prompt_token_count=29 prompt_tokens_details=[ModalityTokenCount(
modality=<MediaModality.TEXT: 'TEXT'>,
token_count=29
)] thoughts_token_count=478 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=516 traffic_type=None
Interestingly, removing response_mime_type="application/json"
, resolves the issue, and the model consistently outputs 0 thinking tokens. But I need a json response, since I use structured outputs. gemini-2.5-flash-preview-05-20
does not exhibit this issue.