Failing to use the API (2.5 pro) - Why Google needs to overcomplicate things?

Henry_Obegi · April 7, 2025, 7:12pm

After creating the API Key and activating a paid tier* in the confusing but clean console.cloud.google(dot)com, I followed the instructions of the dev documentation on: https://ai.google.dev/gemini-api/docs/thinking

The next step was to add the additional configs - like the system prompt, temperature and max_tokens. As this is not shown on the tutorial, I went to the Github page /googleapis/python-genai.
I replicated the exact same code - only changing the model to the 2.5 one.

response = client_google.models.generate_content(
    model="gemini-2.5-pro-exp-03-25",
    contents='high',
    config=types.GenerateContentConfig(
        system_instruction='I say high, you say low',
        max_output_tokens=10,
        temperature=0.3,
    ),
)
print(response.text) # None

When printing the response, I got a clear and informative message:

candidates=None create_time=None response_id=None model_version=‘gemini-2.5-pro-exp-03-25’ prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=None, candidates_tokens_details=None, prompt_token_count=10, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: ‘TEXT’>, token_count=10)], thoughts_token_count=None, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=10) automatic_function_calling_history= parsed=None

When looking at the code from the AI studio, we can see that the class “types.GenerateContentConfig” is still being used to instanciate the configs. In particular, we can see:

generate_content_config = types.GenerateContentConfig(
    temperature=0,
    response_mime_type="text/plain",
)

And using only those params (and spoiler: removing max_output_tokens) works.
Eventually, I realized that there is a high and obscure amount of token that need to be identified. Anything below, the response.txt is None.

Coming back to the previous example, it turned out that the max_tokens needed to get an answer is around 115. This:

response = client_google.models.generate_content(
    model="gemini-2.5-pro-exp-03-25",
    contents='high',
    config=types.GenerateContentConfig(
        system_instruction='I say high, you say low',
        max_output_tokens=100, # This will not work but 130 / 150 will work
        temperature=0.3,
    ),
)
print(response)
print(response.text)

Furthermore, there is not a single mention of the max_output_tokens in the cookbook => Google Colab

I understand that this very promising model was only announced on March 25th, but I fail to see why would Google not prioritize clarifying such an important aspect - how max_output_tokens work in the thinking mode (and having clearer responses from the API when it fails because of that).

If you have a clarification / a link I missed until the Google documentation gets improved, please share it.

(*) Not necessary to have a paid tier to use “gemini-2.5-pro-exp-03-25” but you need it for “gemini-2.5-pro-preview-03-25”

GUNAND_MAYANGLAMBAM · June 17, 2025, 10:26am

Hi @Henry_Obegi , Welcome to the forum.

Thanks so much for the detailed feedback. We have noted your points, especially around max_output_tokens and the lack of clarity in the docs. We are working to improve documentation and error handling, and feedback like this helps a lot. Appreciate you taking the time to share

Topic		Replies	Views
Gemini 2.5 API bug: missing finishReason when max token limit is reached Gemini API api , gemini-api	1	585	April 30, 2025
Gemini 2.0 thinking model returning truncated response with a blob of whitespace Gemini API gemini-20	6	972	January 25, 2025
Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25) Gemini API feedback , api , gemini-25 , gemini-2-5	7	558	June 4, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	12	1116	July 18, 2025
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2493	October 12, 2024

Failing to use the API (2.5 pro) - Why Google needs to overcomplicate things?

Related topics