Heya,
Currently using the Python google.genai package to interact with Gemini, I wanted to set a limit to how many tokens can be used for generating a response but it seems that when using the model gemini-2.5-flash and setting max_output_tokens=1, it doesn’t take max_output_tokens into account.
If you use the model gemini-2.0-flash, it does work, returning as a FINISH_REASON: MAX_TOKENS.
I’m completely lost as to what might be the cause, is someone available to help out?
This is the code I’ve used to test it:
Code
import pprint
from google import genai
from google.genai import types
from pydantic import BaseModel
class ObjectToProcess(BaseModel):
product_id: str
field_to_use: str
field_text: str
class ProcessedObject(BaseModel):
product_id: str
field_used: str
text_generated: str
client = genai.Client()
gemini_conf = types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=0), # Disables thinking
response_mime_type="application/json",
response_schema=list[ProcessedObject],
max_output_tokens=1,
)
product_object = [
ObjectToProcess(product_id="00001", field_to_use="title", field_text="white nike t-shirt"),
ObjectToProcess(product_id="00002", field_to_use="title", field_text="beige nike t-shirt"),
ObjectToProcess(product_id="00003", field_to_use="title", field_text="black uniqlo linen shirt"),
ObjectToProcess(product_id="00004", field_to_use="title", field_text="jeans death stranding special edition"),
ObjectToProcess(product_id="00005", field_to_use="title", field_text="marketer red leather jacket"),
]
prompt = f"Take the following list of items {product_object} which includes the product id (which is referred to as id) and a title (referred to as title). Generate a description from the title."
response = client.models.generate_content(
model="gemini-2.0-flash",
config=gemini_conf,
contents=prompt,
)
pprint.pprint(response)
print(response.candidates[0].finish_reason)
Thanks in advance!
Dediu