Regression bug in 2.5 family: no model response when limiting output tokens

When limiting output tokens for Gemini 2.5 family models, the API response lacks the most important thing - the model response.

Gemini 2.5 Pro/Flash

danielkucal@DK ~ % curl -X POST \
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent' \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "Emphasize how important is API compliance."
          }
        ]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 100
    }
  }'
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": ""
          }
        ],
        "role": "model"
      },
      "finishReason": "MAX_TOKENS",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "totalTokenCount": 109,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 10
      }
    ],
    "thoughtsTokenCount": 99
  },
  "modelVersion": "gemini-2.5-pro",
  "responseId": "HgqOaPfUJLDWvdIPouiJmA4"
}

Older Gemini models (tested 2.0 and 1.5) for comparison:

danielkucal@DK ~ % curl -X POST \
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent' \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "Emphasize how important is API compliance."
          }
        ]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 100
    }
  }'
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "## The Utmost Importance of API Compliance: Ensuring Seamless Integration and Reliability\n\nAPI (Application Programming Interface) compliance is not just a nice-to-have; it's **absolutely critical** for modern software development and integration. It's the foundation upon which stable, reliable, and interoperable systems are built. Ignoring API compliance is akin to building a house with faulty blueprints – sooner or later, things will fall apart.\n\nHere's why API compliance is so important, with emphasis on"
          }
        ],
        "role": "model"
      },
      "finishReason": "MAX_TOKENS",
      "avgLogprobs": -0.44521816253662111
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 9,
    "candidatesTokenCount": 100,
    "totalTokenCount": 109,
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 9
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 100
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "responseId": "TAqOaO6UMoHkx_APu-SDwAc"
}

The same occurs for OpenAI compatible endpoint (https://generativelanguage.googleapis.com/v1beta/openai), while OpenAI returns trimmed model response.

100 tokens for Pro is nothing. It probably exhausted that just in the thinking stage, and had nothing left for the actual response. Try it with 2.5 Flash with thinking disabled and you’ll get 100 tokens of output. You can’t disable thinking for Pro and the minimum you can set the thinking budget for Pro is 128. Hence, nothing left for the text response.