Gemini Batch API Invalid JSON with response_json_schema
I’m currently encountering an issue when running Batch API requests with response_json_schema, where the model returns invalid/truncated JSON responses.
Here is the request template I’m using:
"request": {
"contents": [
{
"role": "user",
"parts": parts
}
],
"model": f"models/{Config.GEMINI_MODEL_NAME}",
"generation_config": {
"response_mime_type": "application/json",
"response_json_schema": flat_asset_schema,
"temperature": 0.1,
"max_output_tokens": 8192
}
}
The response often ends with:
"finishReason": "MAX_TOKENS"
and the JSON becomes incomplete/invalid because the generation is truncated before the structure is closed.
Example response:
{
"finishReason": "MAX_TOKENS",
"content": {
"parts": [
{
"text": "{ ... truncated JSON ... }"
}
]
}
}
Additional metadata:
{
"promptTokenCount": 8137,
"thoughtsTokenCount": 2168,
"candidatesTokenCount": 6010,
"totalTokenCount": 16315
}
It also appears that the actual generated token count can exceed or behave inconsistently relative to the configured max_output_tokens, especially in Batch mode.
Is this a known limitation of Gemini Batch inference with structured JSON output?
Are there recommended best practices to prevent incomplete JSON responses when using:
-
response_json_schema -
long OCR/image analysis outputs
-
Batch API
-
finishReason = MAX_TOKENS
Would reducing prompt size or splitting outputs into smaller chunks be the recommended approach here?