Gemini Batch API invalid JSON when finish_reason is MAX_TOKENS

Gemini Batch API Invalid JSON with response_json_schema

I’m currently encountering an issue when running Batch API requests with response_json_schema, where the model returns invalid/truncated JSON responses.

Here is the request template I’m using:

"request": {
  "contents": [
    {
      "role": "user",
      "parts": parts
    }
  ],
  "model": f"models/{Config.GEMINI_MODEL_NAME}",
  "generation_config": {
    "response_mime_type": "application/json",
    "response_json_schema": flat_asset_schema,
    "temperature": 0.1,
    "max_output_tokens": 8192
  }
}

The response often ends with:

"finishReason": "MAX_TOKENS"

and the JSON becomes incomplete/invalid because the generation is truncated before the structure is closed.

Example response:

{
  "finishReason": "MAX_TOKENS",
  "content": {
    "parts": [
      {
        "text": "{ ... truncated JSON ... }"
      }
    ]
  }
}

Additional metadata:

{
  "promptTokenCount": 8137,
  "thoughtsTokenCount": 2168,
  "candidatesTokenCount": 6010,
  "totalTokenCount": 16315
}

It also appears that the actual generated token count can exceed or behave inconsistently relative to the configured max_output_tokens, especially in Batch mode.

Is this a known limitation of Gemini Batch inference with structured JSON output?

Are there recommended best practices to prevent incomplete JSON responses when using:

  • response_json_schema

  • long OCR/image analysis outputs

  • Batch API

  • finishReason = MAX_TOKENS

Would reducing prompt size or splitting outputs into smaller chunks be the recommended approach here?