Failing to use the API (2.5 pro) - Why Google needs to overcomplicate things?

After creating the API Key and activating a paid tier* in the confusing but clean console.cloud.google(dot)com, I followed the instructions of the dev documentation on: https://ai.google.dev/gemini-api/docs/thinking

The next step was to add the additional configs - like the system prompt, temperature and max_tokens. As this is not shown on the tutorial, I went to the Github page /googleapis/python-genai.
I replicated the exact same code - only changing the model to the 2.5 one.

response = client_google.models.generate_content(
    model="gemini-2.5-pro-exp-03-25",
    contents='high',
    config=types.GenerateContentConfig(
        system_instruction='I say high, you say low',
        max_output_tokens=10,
        temperature=0.3,
    ),
)
print(response.text) # None

When printing the response, I got a clear and informative message:

candidates=None create_time=None response_id=None model_version=‘gemini-2.5-pro-exp-03-25’ prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=None, candidates_tokens_details=None, prompt_token_count=10, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: ‘TEXT’>, token_count=10)], thoughts_token_count=None, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=10) automatic_function_calling_history= parsed=None

When looking at the code from the AI studio, we can see that the class “types.GenerateContentConfig” is still being used to instanciate the configs. In particular, we can see:

generate_content_config = types.GenerateContentConfig(
    temperature=0,
    response_mime_type="text/plain",
)

And using only those params (and spoiler: removing max_output_tokens) works.
Eventually, I realized that there is a high and obscure amount of token that need to be identified. Anything below, the response.txt is None.

Coming back to the previous example, it turned out that the max_tokens needed to get an answer is around 115. This:

response = client_google.models.generate_content(
    model="gemini-2.5-pro-exp-03-25",
    contents='high',
    config=types.GenerateContentConfig(
        system_instruction='I say high, you say low',
        max_output_tokens=100, # This will not work but 130 / 150 will work
        temperature=0.3,
    ),
)
print(response)
print(response.text)

Furthermore, there is not a single mention of the max_output_tokens in the cookbook => Google Colab

I understand that this very promising model was only announced on March 25th, but I fail to see why would Google not prioritize clarifying such an important aspect - how max_output_tokens work in the thinking mode (and having clearer responses from the API when it fails because of that).

If you have a clarification / a link I missed until the Google documentation gets improved, please share it.

(*) Not necessary to have a paid tier to use “gemini-2.5-pro-exp-03-25” but you need it for “gemini-2.5-pro-preview-03-25”

1 Like