Massive Regression: `candidateCount > 1` (Multiple Candidates) disabled in `gemini-3-flash-preview` and newer models

In the latest gemini 3 preview models (3 flash and 3.1 pro) any request with candidateCount > 1 returns a 400 INVALID_ARGUMENT error with the message: "Multiple candidates is not enabled for this model".

This is a significant regression for creative writing and brainstorming applications.

By forcing candidateCount to 1, we are forced to make multiple sequential or parallel API calls, which:

  1. Increases perceived latency for the end-user.
  2. Increases overhead and cost (since the prompt prefix is re-processed multiple times).
  3. Breaks the efficiency of the “generate many, select one” paradigm that LLMs are traditionally great at.

Are there plans to re-enable multiple candidates for these models, or is this a permanent architectural shift? If it’s the latter, it severely impacts the viability of Gemini for creative assistive tools compared to other providers like OpenAI that still support n > 1.

Looking forward to your clarification.

The current restriction on multiple candidates is a significant change for creative writing and brainstorming workflows that rely on the “generate many, select one” approach.

## Current Technical Status

The Gemini 3 Flash and Gemini 3.1 Pro preview models support a of only 1. Attempting to request more results will trigger the 400 INVALID_ARGUMENT error.

## Why this change exists

The current single-candidate focus is often tied to:

Reasoning Capabilities: Newer models use complex internal “thinking” processes. Managing multiple reasoning chains simultaneously increases latency and compute costs significantly.

Feature Integration: Advanced features like Google Search Grounding and structured output are currently optimized for single-stream generation to ensure higher accuracy.

Recommended Workarounds

To maintain your application’s performance, the following adjustments are suggested:

Parallel Requests: Execute multiple API calls simultaneously. This increases token overhead, but it is the most direct way to retrieve diverse options for the end-user.

Temperature Tuning: Set a higher temperature (e.g., 0.9+) to ensure that individual parallel calls yield distinct, creative variations.

Batch API: Use the Gemini Batch API to process multiple prompts more cost-effectively for tasks where immediate user feedback isn’t required.

Looking Ahead

------------------------------

To help provide a more specific solution, please provide the following information:

* What is the target latency for these creative prompts?

* Are specific features like Grounding or File API used alongside these requests?

* Would a discounted batch processing rout

e work for your use case?