Truncated responses despite being under limits

We’ve been experimenting with long requests and structured output on the Gemini 2.5 models, via the Python SDK (google.genai package). Even while setting the max_tokens parameter to the 65535 upper bound on output tokens, though, we often receive truncated responses that are well below the limit:

config = types.GenerateContentConfig(
    http_options=types.HttpOptions(timeout=600000),
    temperature=0.9,
    max_output_tokens=65535,
    response_mime_type="application/json",
    response_schema=SCHEMA,
)

client = genai.Client(api_key=API_KEY)
contents = [
    types.Content(
        role="user",
        parts=[types.Part.from_text(text=PROMPT)],
    )
]
response = await client.aio.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=contents,
    config=config,
)

Examining responses for identical prompts, we see varying token counts that are well under 65535

GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=32907, prompt_token_count=58154, total_token_count=123675
GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=17224, prompt_token_count=58154, total_token_count=123676
GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=None, prompt_token_count=58154, total_token_count=123688

sometimes we even get empty responses. Interestingly, finish_reason is always MAX_TOKENS and total_token_count - prompt_token_count is always near the limit.

What explains this behavior? Are we actually already running up against output limits, even though the response doesn’t seem to reflect that? Given that we’re getting schematized responses back, truncated responses mean malformed JSON. Is there anything we can do to work around this and get properly-formed JSON responses back?

1 Like

Hi @jdtan , Welcome to the forum.

Is this happening with the new Gemini 2.5 version, gemini-2.5-pro-preview-06-05 or gemini-2.5-flash-preview-05-20?"

1 Like

It’s happening on the latest flash version. There is a big thread here Truncated Response Issue with Gemini 2.5 Flash Preview - #31 by Emir_Arditi that shows several people having the issue.