Gemini 2.5 API bug: missing finishReason when max token limit is reached

JensRoland · March 28, 2025, 3:50pm

To replicate:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-exp-03-25:generateContent?key=MY_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "could"
          }
        ]
      }
    ],
    "generationConfig": {
      "temperature": 0.1,
      "maxOutputTokens": 411
    }
  }'

Response:

{
  "usageMetadata": {
    "promptTokenCount": 1,
    "totalTokenCount": 1,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 1
      }
    ]
  },
  "modelVersion": "gemini-2.5-pro-exp-03-25"
}

This is not the same behavior as you get from Gemini 2.0 Flash:

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "Could you please provide more context? I need to know what you want me to do with the word \"could\". For example, are you asking me to:\n\n* **Define it?** (e.g., \"Could you define 'could'?\")\n* **Use it in a sentence?"
          }
        ],
        "role": "model"
      },
      "finishReason": "MAX_TOKENS",
      "avgLogprobs": -0.13195424001724992
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 1,
    "candidatesTokenCount": 61,
    "totalTokenCount": 62,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 1
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 61
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash"
}

With 2.0 Flash, we get a candidate with "finishReason": "MAX_TOKENS" allowing us to determine why we got a truncated/missing response. This is easy to parse.

If I increase the maxOutputTokens value a little bit more, Gemini 2.5 Pro will respond correctly, but truncate the response after just a few tokens. I’m guessing it spends a few hundred tokens on CoT / reasoning before generating response tokens, and if the model hits the token limit during this thinking phase, it fails to generate a candidate and spits out the unparseable response.

Expected behavior would be something like a candidate with an empty content object and a finishReason of MAX_TOKENS.

GUNAND_MAYANGLAMBAM · April 30, 2025, 4:30am

Hi @JensRoland , Welcome to the forum.

It looks like the issue is now resolved, and it’s correctly returning "finishReason": "MAX_TOKENS"

Topic		Replies	Views
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	268	June 11, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	12	1016	July 18, 2025
Preformatted text Invalid operation: The response.text quick accessor requires the response to contain a valid Part, but none were returned. The candidate’s finish_reason is 2 Gemini API text , gemini-flash-2-5	4	234	June 2, 2025
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	38	1441	July 12, 2025
Proposed better handling of `MAX_TOKENS` finishReason Gemini API gemini-15 , feedback , api	6	1746	May 20, 2024

Gemini 2.5 API bug: missing finishReason when max token limit is reached

Related topics