Massive Regression: `candidateCount > 1` (Multiple Candidates) disabled in `gemini-3-flash-preview` and newer models

Amit_Tzah · April 30, 2026, 8:25pm

In the latest gemini 3 preview models (3 flash and 3.1 pro) any request with candidateCount > 1 returns a 400 INVALID_ARGUMENT error with the message: "Multiple candidates is not enabled for this model".

This is a significant regression for creative writing and brainstorming applications.

By forcing candidateCount to 1, we are forced to make multiple sequential or parallel API calls, which:

Increases perceived latency for the end-user.
Increases overhead and cost (since the prompt prefix is re-processed multiple times).
Breaks the efficiency of the “generate many, select one” paradigm that LLMs are traditionally great at.

Are there plans to re-enable multiple candidates for these models, or is this a permanent architectural shift? If it’s the latter, it severely impacts the viability of Gemini for creative assistive tools compared to other providers like OpenAI that still support n > 1.

Looking forward to your clarification.

sarvesh_Deshpande · May 1, 2026, 3:46pm

The current restriction on multiple candidates is a significant change for creative writing and brainstorming workflows that rely on the “generate many, select one” approach.

## Current Technical Status

The Gemini 3 Flash and Gemini 3.1 Pro preview models support a of only 1. Attempting to request more results will trigger the 400 INVALID_ARGUMENT error.

## Why this change exists

The current single-candidate focus is often tied to:

Reasoning Capabilities: Newer models use complex internal “thinking” processes. Managing multiple reasoning chains simultaneously increases latency and compute costs significantly.

Feature Integration: Advanced features like Google Search Grounding and structured output are currently optimized for single-stream generation to ensure higher accuracy.

Recommended Workarounds

To maintain your application’s performance, the following adjustments are suggested:

Parallel Requests: Execute multiple API calls simultaneously. This increases token overhead, but it is the most direct way to retrieve diverse options for the end-user.

Temperature Tuning: Set a higher temperature (e.g., 0.9+) to ensure that individual parallel calls yield distinct, creative variations.

Batch API: Use the Gemini Batch API to process multiple prompts more cost-effectively for tasks where immediate user feedback isn’t required.

Looking Ahead

------------------------------

To help provide a more specific solution, please provide the following information:

* What is the target latency for these creative prompts?

* Are specific features like Grounding or File API used alongside these requests?

* Would a discounted batch processing rout

e work for your use case?

Topic		Replies	Views
Multiple candidates (candidateCount) is not supported for image generation models Gemini API api , gemini , gemini-3	0	210	February 20, 2026
CandidateCount > 1 returns 400 "Multiple candidates is not enabled for this model" on Gemini 3.x models (regression) #1609 Gemini API bug	0	61	May 18, 2026
Limit on CandidateCount - intentional or temporarily? Gemini API	3	272	May 29, 2024
Candidate_count for multiple output from Gemini LLM Gemini API api , models	4	2573	October 7, 2024
gemini-2.5-flash-image — Cannot reliably control number of output images per single API call Gemini API api , gemini-flash-2-5	0	51	April 27, 2026

Massive Regression: `candidateCount > 1` (Multiple Candidates) disabled in `gemini-3-flash-preview` and newer models

Related topics