I’m experimenting with a Gemini 2.0 thinking experimental model by sending a few images and text. However, the response is truncated and contains a big chunk of white space. I double-checked the code which works with the other Gemini models. Not sure if I need to add any additional parameters. Here is the code snippet which invokes the model.
model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
contents = build_content_parts_for_gemini(prompt, image_parts)
request_payload = {"contents": contents}
if "parameters" in model_config["request_format"]:
request_payload["generation_config"] = model_config["request_format"]["parameters"]
try:
response = model.generate_content(**request_payload)
response_text = response.text if response.text else None
if response_text is not None:
return response_text, model_name
else:
logging.warning(f"Model {model_name} returned an empty response.")
return None, model_name
except Exception as e:
logging.exception(f"Error calling model {model_name}: {e}")
return None, model_name
Add this to your request_payload and ensure you’re using the latest version of the Google AI SDK. The max_output_tokens parameter should help with the truncation issue.
When I set the max_output_token to 2048, it doesn’t have the empty buffer at the end of the message, but it still truncates the response. Is there a better way to get the full response? The same prompt and code work fine with the gemini-exp-1206 model.
I also found something interesting. The truncation behavior is affected the prompt format. I’ve two prompts - one is in markdown format and another one in the plain text. When I use the markdown prompt, it truncates the response and also adds a lot of gibberish (probably in different language) to the response.
This should help avoid the gibberish output and improve response quality. The gemini-2.0-flash-thinking-exp model is still experimental, so using structured plain text prompts currently yields better results.
What is the max token limit? In this case, I’m trying to get the model provide more predictable response and that’s the reason to set to 0.7 originally. Also, I’m not sure what the candiate_count does. Should I also remove top_p and top_k values.
In the prompt, I ask the model to give the response in markdown but sometimes it still gives html format.
Sorry for asking too many questions.