We’re evaluating the Gemini Batch API (file-based) as an alternative to streaming generateContent calls for large-scale structured output jobs (~300 requests per batch, JSON response format, with Google Search grounding and thinking enabled) using Gemini 3.1 Pro model.
We’re seeing a ~30% failure rate in batch mode compared to ~6% with the synchronous API for the same prompts (reproducible). The failures appear to be partial JSON responses — the model runs out of output tokens before completing the JSON structure, resulting in unparseable output. These don’t show up as errors on the AI Studio dashboard.
Few questions:
-
Is AI studio dashboard the right place for tracking batch requests? or?
-
Are we billed for requests that return partial/truncated responses? Since the model does generate tokens before truncating, we assume so, but want to confirm.
-
Is there a recommended way to reduce this (e.g., does the batch API use a different default
max_output_tokensthan the synchronous API)?