Summary
We have identified a critical stability issue in gemini-3-flash-preview where the model frequently (3-5% of requests when we send 100+ prompts concurrently) enters an infinite reasoning loop (e.g., repetitively verifying incremental values).
This runaway process causes two concurrent failures:
-
Max Token Exhaustion: The model consumes the entire maxOutputTokens limit (validated at 16k and 32k) while looping.
-
Raw Logic Leak: When the generation is forcibly terminated by the limit, the internal reasoning buffer is returned as a standard text part without the thought: true metadata flag. This causes the API to present unfinished, garbage reasoning loop text as the final “answer.”
The Core Issue
The issue is not simply a missing response, but a failure in the model’s stop condition during reasoning.
-
The Trigger: The model enters a repetitive verification cycle (e.g., checking n, then n+1, then n+2…) without ever converging on a final answer.
-
The Leak: When finishReason hits MAX_TOKENS during this loop, the API flushes the current buffer.
-
The Consequence: The client receives a content.part containing the loop (e.g., “Wait, let’s check…”) but missing the thought: true tag. The parser incorrectly treats this as the final user-facing response.
Environment & Configuration
-
Model: gemini-3-flash-preview
-
Token Limits: Validated with maxOutputTokens set to 16k and 32k.
-
Mode: Reproducible in both Batch mode and standard API calls.
-
Frequency: Affects approximately 3-5% of logic/code-based responses.
Reproduction Details
-
Prompt: The model is presented with a logic or bit-manipulation problem (e.g., “Bitwise Toggle”).
-
Looping Behavior: Instead of deriving a general formula immediately, the model begins verifying the solution against specific integers incrementally (e.g., checking n=67108863, then n=67108864, and so on).
-
Termination: The generation hits the token limit.
-
Result: The intended answer is never produced. The output contains only the incomplete reasoning loop, incorrectly formatted as a standard text response.
Evidence / Example Payload
In the specific instance below, the model consumed 15,356 tokens in thoughts. The second content part contains the infinite loop text (“Wait, let’s check…”) but is missing the thought: true flag, causing it to be interpreted as the final answer.
Snippet of the Infinite Loop (Leaked Text):
“…Wait, let’s check n = 67108863… Correct. Wait, let’s check n = 67108864… Correct. Wait, let’s check n = 134217727… Correct. Wait, let’s check n = 134217728…”
Full API Response:
{
"response": {
"responseId": ".....",
"usageMetadata": {
"totalTokenCount": 16233,
"thoughtsTokenCount": 15356, // <--- Proof of runaway reasoning
"candidatesTokenCount": 640
},
"modelVersion": "gemini-3-flash-preview",
"candidates": [
{
"content": {
"parts": [
{
"text": "**Algorithm for Bitwise Toggle**\n\nOkay, here's my line of thinking...",
"thought": true
},
{
// BUG: This part is raw reasoning loop but lacks "thought": true
"text": "33554432 ^ 33554430 = 67108862... \n\n Wait, let's check `n = 67108863`... \n\n Wait, let's check `n = 67108864`... \n\n Wait, let's check `n = 134217728`...",
"thoughtSignature": "....."
}
],
"role": "model"
},
"finishReason": "MAX_TOKENS"
}
]
}
}
Could you suggest if there are any workarounds for this issue? Thank you so much.