User Written Introduction of Issue
In OpenCode and Gemini CLI, using OAuth or an API key, I cannot get Gemini 3 Flash Thinking to write an .md file that is longer than ~3000 tokens in one operation. This prevents Gemini from being able to comprehensively plan and write documents, such as a project SPEC. This is something Claude models do quite well, despite having a much smaller context window.
The Gemini models are represented as having a ~65k output limit, but in practice, I cannot get even a small fraction of that written to a file. Which has me wondering why.
After several hours of trying to figure this out myself, and talking with Gemini and ChatGPT, Iām at a loss. Iāve tried many changes to settings (maxOutputTokens, thinking on/off, etc.) and no meaningful change. It feels like there is some cap on output that prevents Gemini from being able to produce useful documents and responses. This severely degrades Geminiās performance on high-complexity tasks, and limits itās utility on low-complexity tasks that models like Sonnet 4.5, despite it less robust reasoning capabilities, excels at (e.g. data extraction and consolidation).
Currently, Gemini has zero capacity for orchestration in my workflow due to its severely constrained (or broken?) output limits. It cannot write the documentation most of my projects require to progress.
GPT 5.2 Written Distillation of Testing Data (User Reviewed)
What I tested (REST API evidence, in addition to OpenCode + Gemini CLI)
Environment:
- Windows 10
- Windows Terminal, PowerShell
- Gemini API via AI Studio API key (also tested with a paid-tier key)
- Endpoint:
https://generativelanguage.googleapis.com/v1beta/...
1) models.get reports 65,536 output tokens (so the model metadata agrees with the published limit)
When I run models.get for models/gemini-3-flash-preview, the API reports:
inputTokenLimit: 1048576outputTokenLimit: 65536temperature: 1topP: 0.95topK: 64thinking: true
So as far as get_model / models.get is concerned, the output limit is exactly what the docs imply: 65,536 output tokens.
2) But generateContent stops around ~3k output tokens with finishReason: STOP (not MAX_TOKENS), even with huge maxOutputTokens
I repeatedly asked Flash 3 to output very long markdown (ex: āOutput a Markdown document that is 20,000 tokens longā¦ā) with:
maxOutputTokens = 60000- thinking enabled (thinkingLevel HIGH)
- also tested removing thinking config entirely
What I get back over and over is:
finishReason: STOPcandidatesTokenCount: 2952(typical)thoughtsTokenCount: ~700ā800(varies)- Output length: ~2,000 words / ~13k chars range (very roughly)
I also ran a variant prompt (āas long as possibleā) and got essentially the same behavior:
finishReason: STOPcandidatesTokenCount: 2982
So the model is not being cut off by a configured ceiling (maxOutputTokens) and it is not reporting a max-token termination. It just stops early on its own.
3) Same behavior with paid tier key (so it does not look like only a free-tier restriction)
I repeated Flash 3 tests using an API key tied to a paid tier, and the results were the same: ~2952 output tokens and finishReason: STOP.
4) Side note: gemini-3-pro-preview returned 429 RESOURCE_EXHAUSTED on free tier for me
When I attempted Pro, the API returned HTTP 429 with RESOURCE_EXHAUSTED and quota metrics indicating free-tier limits effectively at 0 for that model in my case. (This may be unrelated to the Flash 3 output-stopping issue, but Iām including it for completeness.)
What Iām asking
Iām looking for feedback, information, anything to let me know of a solution or workaround, or if I should just give up entirely trying to use Gemini and divert all development resources to Anthropic and OpenAI.
Specifically:
-
Is there a known issue where Gemini 3 Flash (Thinking) self-terminates around ~3k output tokens with
finishReason: STOP, even whenmaxOutputTokensis set very high? -
Is there any documented mechanism to discourage early stopping for long-form generation (SPECs, long reports), or is the correct approach to ācontinueā in multiple turns / multiple calls?
-
If
models.getreportsoutputTokenLimit: 65536, is it expected that a singlegenerateContentcall still cannot practically produce anywhere near that in one response? -
Are there recommended generation settings for long-form output (temperature, topP/topK, other flags) that actually allow multi-tens-of-thousands token outputs in a single call?
Because right now, in actual use, the ā65k output limitā is effectively meaningless for document authoring. The model just stops.
Any guidance, confirmation of whether this is a known limitation/bug, or a recommended workaround would be appreciated.

