Gemini 2.0 thinking model returning truncated response with a blob of whitespace

khuss · January 25, 2025, 6:00am

I’m experimenting with a Gemini 2.0 thinking experimental model by sending a few images and text. However, the response is truncated and contains a big chunk of white space. I double-checked the code which works with the other Gemini models. Not sure if I need to add any additional parameters. Here is the code snippet which invokes the model.

        model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
        contents = build_content_parts_for_gemini(prompt, image_parts)
        request_payload = {"contents": contents}

        if "parameters" in model_config["request_format"]:
            request_payload["generation_config"] = model_config["request_format"]["parameters"]

        try:
            response = model.generate_content(**request_payload)
            response_text = response.text if response.text else None
            
            if response_text is not None:
                return response_text, model_name
            else:
                logging.warning(f"Model {model_name} returned an empty response.")
                return None, model_name

        except Exception as e:
            logging.exception(f"Error calling model {model_name}: {e}")
            return None, model_name

Looking for any tips to get this working.

Thank you

KRows · January 25, 2025, 6:05pm

generation_config = {
    "max_output_tokens": 2048,
    "temperature": 0.7,
    "top_p": 0.8,
    "top_k": 40
}

Add this to your request_payload and ensure you’re using the latest version of the Google AI SDK. The max_output_tokens parameter should help with the truncation issue.

khuss · January 25, 2025, 7:03pm

When I set the max_output_token to 2048, it doesn’t have the empty buffer at the end of the message, but it still truncates the response. Is there a better way to get the full response? The same prompt and code work fine with the gemini-exp-1206 model.

khuss · January 25, 2025, 7:23pm

I also found something interesting. The truncation behavior is affected the prompt format. I’ve two prompts - one is in markdown format and another one in the plain text. When I use the markdown prompt, it truncates the response and also adds a lot of gibberish (probably in different language) to the response.

KRows · January 25, 2025, 7:30pm

generation_config = {
    "max_output_tokens": 8192,  # Increased token limit
    "temperature": 0.9,
    "stop_sequences": ["\n\n"],  # Help control response ending
    "candidate_count": 1
}

This should help avoid the gibberish output and improve response quality. The gemini-2.0-flash-thinking-exp model is still experimental, so using structured plain text prompts currently yields better results.

khuss · January 25, 2025, 7:47pm

What is the max token limit? In this case, I’m trying to get the model provide more predictable response and that’s the reason to set to 0.7 originally. Also, I’m not sure what the candiate_count does. Should I also remove top_p and top_k values.
In the prompt, I ask the model to give the response in markdown but sometimes it still gives html format.
Sorry for asking too many questions.

KRows · January 25, 2025, 8:08pm

The typical token limit varies by model, but for most LLMs it’s between 2048-4096 tokens.

For temperature settings:

Keep temperature at 0.7 for balanced creativity/consistency
candidate_count determines how many alternative completions to generate
You can remove top_p and top_k if using temperature, as they’re alternative ways to control randomness

To ensure markdown output, explicitly specify the format in your system prompt and validate the response format in your code. Here’s a simple example:

response_format = {
    "type": "markdown",
    "max_tokens": 4096,
    "temperature": 0.7
}

Topic		Replies	Views
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	199	June 11, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	9	716	June 18, 2025
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2451	October 12, 2024
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	35	1068	June 27, 2025
Failing to use the API (2.5 pro) - Why Google needs to overcomplicate things? Gemini API api	1	190	June 17, 2025

Gemini 2.0 thinking model returning truncated response with a blob of whitespace

Related topics