Thinking in flash-preview-09-2025

Diego_Nascimento · November 13, 2025, 10:20pm

It is not possible to disable thinking in flash-preview-09-2025 even using thinking_config=genai_types.ThinkingConfig(thinking_budget=0).

This only happens in preview-09-2025.

Mahesh_Sutar · December 16, 2025, 2:30pm

Hello
Welcome to the Forum !!

are you still running into this issue with thinking_budget=0 being ignored? If so, would you mind sharing a small snippet of reproducible code? It would really help in investigating this further.

Thanks

Magnus_Kull · December 16, 2025, 4:41pm

I’ve battled against this for some time now. A have a snippet from earlier since I have reported this before.

from google import genai
from google.genai import types
import PIL.Image
from pydantic import BaseModel, Field
from typing import Optional

class StructuredOutputSchema(BaseModel):
    main_entry: str = Field(description="The main entry of the catalog card. The main entry is the text on the first line of the card, often underlined")
    bibliographic_information: str = Field(description="The full OCR text extracted from the image, including the main entry, excluding subject headings and shelf mark.")
    subject_headings: Optional[str] = Field(description="A list of subject headings found on the card. The subject headings are handwritten notes in the top left corner of the card.")
    shelf_mark: Optional[str] = Field(description="The shelf mark or call number of the book. The shelf mark or call number is a handwritten note in the top right corner of the card.")

API_KEY="API_KEY_HERE"

generation_config = types.GenerateContentConfig(
    temperature = 0.1,
    top_p = 0.95,
    top_k = 40,
    max_output_tokens = 2000,
    response_mime_type = "application/json",
    response_json_schema = StructuredOutputSchema.model_json_schema(),
    system_instruction = "You are an OCR interpreter. Your task is to extract all text from the provided image of a library catalog card. Ensure that the extracted text is accurate and complete, preserving the original formatting as much as possible. Also, ensure that all characters are captured and returned correctly, even those outside the standard ASCII range. The card contains bibliographic information such as title, author, publication year, and location. The information can appear in various languages which may include special characters.",
    http_options = {"timeout": 60000},
    thinking_config = types.ThinkingConfig(thinking_budget=0)
)

image_path = "003_00009.jpg"

# model = "gemini-2.5-flash"
model = "gemini-2.5-flash-preview-09-2025"
image = PIL.Image.open(image_path)
prompt = "Return the extracted text in JSON format according to the specified schema."
contents = [image, prompt]

client = genai.Client(api_key=API_KEY)
result = client.models.generate_content(model=model, contents=contents,config=generation_config)

print(result)

The image referenced:

Response:

sdk_http_response=HttpResponse(
  headers=<dict len=11>
) candidates=[Candidate(
  content=Content(
    parts=[
      Part(
        text="""{
  "main_entry": "Achrelius, Daniel",
  "bibliographic_information": "Achrelius, Daniel Memoria amplissimi viri Enevaldi Svenonii, s.s. theologiæ doctoris, professoris ejusdem facultatis primarii... solenni oratione, ab oblivione & tenebris vindicata, postridie ex-eqviarum. 8:o /Åbo/ 1689.",
  "subject_headings": "<Svenonius, Enevald>",
  "shelf_mark": "(Bn) Biogr. Sw."
}"""
      ),
    ],
    role='model'
  ),
  finish_reason=<FinishReason.STOP: 'STOP'>,
  index=0
)] create_time=None model_version='gemini-2.5-flash-preview-09-2025' prompt_feedback=None response_id='l4pBabbPMb70xN8PyqfP2Q0' usage_metadata=GenerateContentResponseUsageMetadata(
  candidates_token_count=135,
  prompt_token_count=365,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=107
    ),
    ModalityTokenCount(
      modality=<MediaModality.IMAGE: 'IMAGE'>,
      token_count=258
    ),
  ],
  thoughts_token_count=606,
  total_token_count=1106
) automatic_function_calling_history=[] parsed={'main_entry': 'Achrelius, Daniel', 'bibliographic_information': 'Achrelius, Daniel Memoria amplissimi viri Enevaldi Svenonii, s.s. theologiæ doctoris, professoris ejusdem facultatis primarii... solenni oratione, ab oblivione & tenebris vindicata, postridie ex-eqviarum. 8:o /Åbo/ 1689.', 'subject_headings': '<Svenonius, Enevald>', 'shelf_mark': '(Bn) Biogr. Sw.'}

Thoughts_token_count is 606.

A related issue, at least for me:
Switching to gemini-2.5-flash results in endless newline characters when the model tries to generate the letter “Å” (I have thousands of examples of this). I was told this was solved with improvements in structured output in the 09-preview version. But, and what this thread is about, the preview version ignores thinking_budget.

Mahesh_Sutar · December 18, 2025, 3:31pm

Hello,

Thanks for flagging this. I will escalate this issue to our internal team.

Thanks

Gael_MM · January 5, 2026, 10:54am

Hi @Mahesh_Sutar , we are facing the same issue and it would be great if we could turn off thinking even for preview models (or at least document this somewhere)

Mahesh_Sutar · January 5, 2026, 2:29pm

hello @Gael_MM ,

Thanks for the feedback. We have already escalated this issue to our internal team for review.

Topic		Replies	Views
Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0 Gemini API help_request	5	1593	April 22, 2025
Latest @google/genai with 2.5 flash ignoring thinking budget Gemini API generative-ai , gemini-flash	11	448	December 2, 2025
Gemini-2.5-flash-preview-09-2025 breaks the thinking_budget parameter Gemini API bug , gemini-flash-2-5	3	370	October 21, 2025
Gemini 2.5 Flash Thinking Tokens using OpenAI API Gemini API help_request	16	1493	June 12, 2025
Gemini 2.5 Flash Overthinking by a lot Gemini API prompt , gemini-2	6	474	September 5, 2025

Thinking in flash-preview-09-2025

Related topics