Gemini Batch API vs Streaming API — Reliability

We’re evaluating the Gemini Batch API (file-based) as an alternative to streaming generateContent calls for large-scale structured output jobs (~300 requests per batch, JSON response format, with Google Search grounding and thinking enabled) using Gemini 3.1 Pro model.

We’re seeing a ~30% failure rate in batch mode compared to ~6% with the synchronous API for the same prompts (reproducible). The failures appear to be partial JSON responses — the model runs out of output tokens before completing the JSON structure, resulting in unparseable output. These don’t show up as errors on the AI Studio dashboard.

Few questions:

  • Is AI studio dashboard the right place for tracking batch requests? or?

  • Are we billed for requests that return partial/truncated responses? Since the model does generate tokens before truncating, we assume so, but want to confirm.

  • Is there a recommended way to reduce this (e.g., does the batch API use a different default max_output_tokens than the synchronous API)?