Gemini 3 Flash Preview – Infinite Reasoning Loop Causing Max Token Exhaustion & Raw Logic Leak

Tommy_Asai · January 8, 2026, 3:40pm

Summary

We have identified a critical stability issue in gemini-3-flash-preview where the model frequently (3-5% of requests when we send 100+ prompts concurrently) enters an infinite reasoning loop (e.g., repetitively verifying incremental values).

This runaway process causes two concurrent failures:

Max Token Exhaustion: The model consumes the entire maxOutputTokens limit (validated at 16k and 32k) while looping.
Raw Logic Leak: When the generation is forcibly terminated by the limit, the internal reasoning buffer is returned as a standard text part without the thought: true metadata flag. This causes the API to present unfinished, garbage reasoning loop text as the final “answer.”

The Core Issue

The issue is not simply a missing response, but a failure in the model’s stop condition during reasoning.

The Trigger: The model enters a repetitive verification cycle (e.g., checking n, then n+1, then n+2…) without ever converging on a final answer.
The Leak: When finishReason hits MAX_TOKENS during this loop, the API flushes the current buffer.
The Consequence: The client receives a content.part containing the loop (e.g., “Wait, let’s check…”) but missing the thought: true tag. The parser incorrectly treats this as the final user-facing response.

Environment & Configuration

Model: gemini-3-flash-preview
Token Limits: Validated with maxOutputTokens set to 16k and 32k.
Mode: Reproducible in both Batch mode and standard API calls.
Frequency: Affects approximately 3-5% of logic/code-based responses.

Reproduction Details

Prompt: The model is presented with a logic or bit-manipulation problem (e.g., “Bitwise Toggle”).
Looping Behavior: Instead of deriving a general formula immediately, the model begins verifying the solution against specific integers incrementally (e.g., checking n=67108863, then n=67108864, and so on).
Termination: The generation hits the token limit.
Result: The intended answer is never produced. The output contains only the incomplete reasoning loop, incorrectly formatted as a standard text response.

Evidence / Example Payload

In the specific instance below, the model consumed 15,356 tokens in thoughts. The second content part contains the infinite loop text (“Wait, let’s check…”) but is missing the thought: true flag, causing it to be interpreted as the final answer.

Snippet of the Infinite Loop (Leaked Text):

“…Wait, let’s check n = 67108863… Correct. Wait, let’s check n = 67108864… Correct. Wait, let’s check n = 134217727… Correct. Wait, let’s check n = 134217728…”

Full API Response:

{
  "response": {
    "responseId": ".....",
    "usageMetadata": {
      "totalTokenCount": 16233,
      "thoughtsTokenCount": 15356,  // <--- Proof of runaway reasoning
      "candidatesTokenCount": 640
    },
    "modelVersion": "gemini-3-flash-preview",
    "candidates": [
      {
        "content": {
          "parts": [
            {
              "text": "**Algorithm for Bitwise Toggle**\n\nOkay, here's my line of thinking...",
              "thought": true
            },
            {
              // BUG: This part is raw reasoning loop but lacks "thought": true
              "text": "33554432 ^ 33554430 = 67108862... \n\n Wait, let's check `n = 67108863`... \n\n Wait, let's check `n = 67108864`... \n\n Wait, let's check `n = 134217728`...",
              "thoughtSignature": "....."
            }
          ],
          "role": "model"
        },
        "finishReason": "MAX_TOKENS"
      }
    ]
  }
}

Could you suggest if there are any workarounds for this issue? Thank you so much.

GregH · January 11, 2026, 1:56am

We experience the same problem sometimes. At one point, it consistently happened in the second turn of an agent run. Personally, I also sometimes see it in Cursor.

Srikanta_K_N · January 12, 2026, 6:50am

Hi @Tommy_Asai, welcome to the community!

Apologies for the delayed response.

I tried to reproduce the issue, but the model didn’t get stuck in a verification loop.

Could you please share the exact prompt you sent and also the exact configuration set, so that we can try again?

Thank you!

dmts · January 14, 2026, 12:42pm

I’m experiencing the same problem with gemini-3-flash-preview through AI Studio. These loops were introduced recently and did not exist in the initial release. The prompt itself doesn’t guarantee it will get stuck in a loop tho. Can provide logs with requests/responses if needed

Tommy_Asai · January 14, 2026, 1:36pm

Hi, @Srikanta_K_N , thanks for your response,

Could you please share the exact prompt you sent and also the exact configuration set, so that we can try again?

Here are some information that might be relevant:

Provider Settings

thinking_config = ThinkingConfig(

    thinking_level=ThinkingLevel.HIGH,

    include_thoughts=True # or False. Doesn't matter

)

generation_config = GenerateContentConfig(

    max_output_tokens=max_tokens,  # 16k or 32k

    temperature=0.0,

    thinking_config=thinking_config,

    system_instruction=system_message  # check below

)

Prompts

We were making prompts from hundreds of benchmark problems

Example Dataset: MBPP (Mostly Basic Programming Problems) - Java translation

Source: nuprl/MultiPL-E
Total Problems: 386
Language: Java
Problem Type: Logic and algorithmic problems

System message

You are a coding assistant that specializes in Java. You are given a problem in natural language and a function signature.... (continues)

Example Problem (toggle_middle_bits - MBPP_java_319)

import java.util.*;
import java.lang.reflect.*;
import org.javatuples.*;
import java.security.*;
import java.math.*;
import java.io.*;
import java.util.stream.*;
class Problem {
    // Write a javathon function to toggle bits of the number except the first and the last bit. 
    // https://www.geeksforgeeks.org/toggle-bits-number-expect-first-last-bits/
    public static long toggleMiddleBits(long n) {

Tommy_Asai · January 23, 2026, 7:29am

Would you have any updates on this? @Srikanta_K_N

Sensi · January 23, 2026, 12:37pm

That reassures me, as I have also experienced this on several occasions. I hope we will have answers, or at least a solution, very soon.

Emanuele_Musio · January 28, 2026, 12:57pm

i have the same issue! I am using file search + gemini 3 flash preview. 30 pdf in my store. Same query: 20k tokens on 2.5 PRO vs 4.1m using gemini 3 flash.. i hope this gets sorted soon. Would this happen with the 3 pro as well?

Tommy_Asai · January 28, 2026, 1:21pm

I ran the same prompts on Gemini 3 Pro, but I could not reproduce the issue.

Emanuele_Musio · January 28, 2026, 1:23pm

thanks man… i will give it a go.

Jon_Matthews · January 28, 2026, 3:04pm

Hi @Tommy_Asai . I’d recommend you set the temperature to 1.0. The reasoning process requires a certain degree of probabilistic freedom, if you set the temperature to 0.0, the model can become trapped in the single highest-probability path.

Tommy_Asai · January 29, 2026, 2:29pm

Thank you so much. The loop issue did not happen with temperature 1.0.

Is 1.0 the value you recommend for the Gemini “thinking” models in general?

Jon_Matthews · January 29, 2026, 3:16pm

Speaking about the 3.0 thinking models, yes. We’re planning more communication for this soon.

Glad to help!

Chen_Levy · February 16, 2026, 8:49am

I’m using 1 temp but occasionally still encounter this infinite loop - what’s the status on resolving it and is there a temporary solution I can implement for now?

Topic		Replies	Views
Gemini 2.5-flash stuck in a tool call loop when using both tools and structured output Gemini API api , gemini , gemini-flash	8	538	January 30, 2026
Bug Report the model often starts creating repetitive sequences of tokens Gemini API gemini-15	15	1715	February 5, 2026
Big Problem! Gemini 3.0 pro preview thought token exceeding problem Gemini API bug , api , gemini , thinking	8	653	March 29, 2026
Infinite loop bug while using the Gemini 3 Google Antigravity bug	17	915	March 31, 2026
Using Pro 06-06 am getting endless STREAM_CHUNK sent Gemini API bug , issues	11	319	July 15, 2025