After creating the API Key and activating a paid tier* in the confusing but clean console.cloud.google(dot)com, I followed the instructions of the dev documentation on: https://ai.google.dev/gemini-api/docs/thinking
The next step was to add the additional configs - like the system prompt, temperature and max_tokens. As this is not shown on the tutorial, I went to the Github page /googleapis/python-genai.
I replicated the exact same code - only changing the model to the 2.5 one.
response = client_google.models.generate_content(
model="gemini-2.5-pro-exp-03-25",
contents='high',
config=types.GenerateContentConfig(
system_instruction='I say high, you say low',
max_output_tokens=10,
temperature=0.3,
),
)
print(response.text) # None
When printing the response, I got a clear and informative message:
candidates=None create_time=None response_id=None model_version=âgemini-2.5-pro-exp-03-25â prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=None, candidates_tokens_details=None, prompt_token_count=10, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: âTEXTâ>, token_count=10)], thoughts_token_count=None, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=10) automatic_function_calling_history= parsed=None
When looking at the code from the AI studio, we can see that the class âtypes.GenerateContentConfigâ is still being used to instanciate the configs. In particular, we can see:
generate_content_config = types.GenerateContentConfig(
temperature=0,
response_mime_type="text/plain",
)
And using only those params (and spoiler: removing max_output_tokens) works.
Eventually, I realized that there is a high and obscure amount of token that need to be identified. Anything below, the response.txt is None.
Coming back to the previous example, it turned out that the max_tokens needed to get an answer is around 115. This:
response = client_google.models.generate_content(
model="gemini-2.5-pro-exp-03-25",
contents='high',
config=types.GenerateContentConfig(
system_instruction='I say high, you say low',
max_output_tokens=100, # This will not work but 130 / 150 will work
temperature=0.3,
),
)
print(response)
print(response.text)
Furthermore, there is not a single mention of the max_output_tokens in the cookbook => Google Colab
I understand that this very promising model was only announced on March 25th, but I fail to see why would Google not prioritize clarifying such an important aspect - how max_output_tokens work in the thinking mode (and having clearer responses from the API when it fails because of that).
If you have a clarification / a link I missed until the Google documentation gets improved, please share it.
(*) Not necessary to have a paid tier to use âgemini-2.5-pro-exp-03-25â but you need it for âgemini-2.5-pro-preview-03-25â