When limiting output tokens for Gemini 2.5 family models, the API response lacks the most important thing - the model response.
Gemini 2.5 Pro/Flash
danielkucal@DK ~ % curl -X POST \
'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent' \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [
{
"role": "user",
"parts": [
{
"text": "Emphasize how important is API compliance."
}
]
}
],
"generationConfig": {
"maxOutputTokens": 100
}
}'
{
"candidates": [
{
"content": {
"parts": [
{
"text": ""
}
],
"role": "model"
},
"finishReason": "MAX_TOKENS",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 10,
"totalTokenCount": 109,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 10
}
],
"thoughtsTokenCount": 99
},
"modelVersion": "gemini-2.5-pro",
"responseId": "HgqOaPfUJLDWvdIPouiJmA4"
}
Older Gemini models (tested 2.0 and 1.5) for comparison:
danielkucal@DK ~ % curl -X POST \
'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent' \
-H "Content-Type: application/json" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-d '{
"contents": [
{
"role": "user",
"parts": [
{
"text": "Emphasize how important is API compliance."
}
]
}
],
"generationConfig": {
"maxOutputTokens": 100
}
}'
{
"candidates": [
{
"content": {
"parts": [
{
"text": "## The Utmost Importance of API Compliance: Ensuring Seamless Integration and Reliability\n\nAPI (Application Programming Interface) compliance is not just a nice-to-have; it's **absolutely critical** for modern software development and integration. It's the foundation upon which stable, reliable, and interoperable systems are built. Ignoring API compliance is akin to building a house with faulty blueprints – sooner or later, things will fall apart.\n\nHere's why API compliance is so important, with emphasis on"
}
],
"role": "model"
},
"finishReason": "MAX_TOKENS",
"avgLogprobs": -0.44521816253662111
}
],
"usageMetadata": {
"promptTokenCount": 9,
"candidatesTokenCount": 100,
"totalTokenCount": 109,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 9
}
],
"candidatesTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 100
}
]
},
"modelVersion": "gemini-2.0-flash",
"responseId": "TAqOaO6UMoHkx_APu-SDwAc"
}
The same occurs for OpenAI compatible endpoint (https://generativelanguage.googleapis.com/v1beta/openai), while OpenAI returns trimmed model response.