Truncated responses despite being under limits

jdtan · June 9, 2025, 5:13am

We’ve been experimenting with long requests and structured output on the Gemini 2.5 models, via the Python SDK (google.genai package). Even while setting the max_tokens parameter to the 65535 upper bound on output tokens, though, we often receive truncated responses that are well below the limit:

config = types.GenerateContentConfig(
    http_options=types.HttpOptions(timeout=600000),
    temperature=0.9,
    max_output_tokens=65535,
    response_mime_type="application/json",
    response_schema=SCHEMA,
)

client = genai.Client(api_key=API_KEY)
contents = [
    types.Content(
        role="user",
        parts=[types.Part.from_text(text=PROMPT)],
    )
]
response = await client.aio.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=contents,
    config=config,
)

Examining responses for identical prompts, we see varying token counts that are well under 65535

GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=32907, prompt_token_count=58154, total_token_count=123675
GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=17224, prompt_token_count=58154, total_token_count=123676
GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=None, prompt_token_count=58154, total_token_count=123688

sometimes we even get empty responses. Interestingly, finish_reason is always MAX_TOKENS and total_token_count - prompt_token_count is always near the limit.

What explains this behavior? Are we actually already running up against output limits, even though the response doesn’t seem to reflect that? Given that we’re getting schematized responses back, truncated responses mean malformed JSON. Is there anything we can do to work around this and get properly-formed JSON responses back?

GUNAND_MAYANGLAMBAM · June 10, 2025, 8:32am

Hi @jdtan , Welcome to the forum.

Is this happening with the new Gemini 2.5 version, gemini-2.5-pro-preview-06-05 or gemini-2.5-flash-preview-05-20?"

cor · June 11, 2025, 5:58pm

It’s happening on the latest flash version. There is a big thread here Truncated Response Issue with Gemini 2.5 Flash Preview - #31 by Emir_Arditi that shows several people having the issue.

Topic		Replies	Views
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	39	1504	July 26, 2025
Gemini 2.5 API bug: missing finishReason when max token limit is reached Gemini API api , gemini-api	1	565	April 30, 2025
Gemini 2.0 thinking model returning truncated response with a blob of whitespace Gemini API gemini-20	6	951	January 25, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	12	1073	July 18, 2025
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2488	October 12, 2024

Truncated responses despite being under limits

Related topics