Regression bug in 2.5 family: no model response when limiting output tokens

DanielKucal · August 2, 2025, 1:09pm

When limiting output tokens for Gemini 2.5 family models, the API response lacks the most important thing - the model response.

Gemini 2.5 Pro/Flash

danielkucal@DK ~ % curl -X POST \
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent' \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "Emphasize how important is API compliance."
          }
        ]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 100
    }
  }'
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": ""
          }
        ],
        "role": "model"
      },
      "finishReason": "MAX_TOKENS",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "totalTokenCount": 109,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 10
      }
    ],
    "thoughtsTokenCount": 99
  },
  "modelVersion": "gemini-2.5-pro",
  "responseId": "HgqOaPfUJLDWvdIPouiJmA4"
}

Older Gemini models (tested 2.0 and 1.5) for comparison:

danielkucal@DK ~ % curl -X POST \
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent' \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "Emphasize how important is API compliance."
          }
        ]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 100
    }
  }'
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "## The Utmost Importance of API Compliance: Ensuring Seamless Integration and Reliability\n\nAPI (Application Programming Interface) compliance is not just a nice-to-have; it's **absolutely critical** for modern software development and integration. It's the foundation upon which stable, reliable, and interoperable systems are built. Ignoring API compliance is akin to building a house with faulty blueprints – sooner or later, things will fall apart.\n\nHere's why API compliance is so important, with emphasis on"
          }
        ],
        "role": "model"
      },
      "finishReason": "MAX_TOKENS",
      "avgLogprobs": -0.44521816253662111
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 9,
    "candidatesTokenCount": 100,
    "totalTokenCount": 109,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 9
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 100
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "responseId": "TAqOaO6UMoHkx_APu-SDwAc"
}

The same occurs for OpenAI compatible endpoint (https://generativelanguage.googleapis.com/v1beta/openai), while OpenAI returns trimmed model response.

Richard_Davey · August 2, 2025, 3:09pm

100 tokens for Pro is nothing. It probably exhausted that just in the thinking stage, and had nothing left for the actual response. Try it with 2.5 Flash with thinking disabled and you’ll get 100 tokens of output. You can’t disable thinking for Pro and the minimum you can set the thinking budget for Pro is 128. Hence, nothing left for the text response.

Topic		Replies	Views
Gemini 2.5 API bug: missing finishReason when max token limit is reached Gemini API api , gemini-api	1	1068	April 30, 2025
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	1114	June 11, 2025
`max_output_tokens` isn't respected when using `gemini-2.5-flash` model Gemini API bug	7	910	October 4, 2025
finishReason STOP but parts is missing inside candidate Gemini API api , models , gemini_25_pro	12	1198	February 24, 2026
Gemini 3 output limited to ~4k tokens instead of 65k Gemini API bug , api , gemini , api-key	9	990	January 14, 2026

Regression bug in 2.5 family: no model response when limiting output tokens

Related topics