Gemini 3 Flash Preview – Infinite Reasoning Loop Causing Max Token Exhaustion & Raw Logic Leak

Summary

We have identified a critical stability issue in gemini-3-flash-preview where the model frequently (3-5% of requests when we send 100+ prompts concurrently) enters an infinite reasoning loop (e.g., repetitively verifying incremental values).

This runaway process causes two concurrent failures:

  1. Max Token Exhaustion: The model consumes the entire maxOutputTokens limit (validated at 16k and 32k) while looping.

  2. Raw Logic Leak: When the generation is forcibly terminated by the limit, the internal reasoning buffer is returned as a standard text part without the thought: true metadata flag. This causes the API to present unfinished, garbage reasoning loop text as the final “answer.”

The Core Issue

The issue is not simply a missing response, but a failure in the model’s stop condition during reasoning.

  1. The Trigger: The model enters a repetitive verification cycle (e.g., checking n, then n+1, then n+2…) without ever converging on a final answer.

  2. The Leak: When finishReason hits MAX_TOKENS during this loop, the API flushes the current buffer.

  3. The Consequence: The client receives a content.part containing the loop (e.g., “Wait, let’s check…”) but missing the thought: true tag. The parser incorrectly treats this as the final user-facing response.

Environment & Configuration

  • Model: gemini-3-flash-preview

  • Token Limits: Validated with maxOutputTokens set to 16k and 32k.

  • Mode: Reproducible in both Batch mode and standard API calls.

  • Frequency: Affects approximately 3-5% of logic/code-based responses.

Reproduction Details

  1. Prompt: The model is presented with a logic or bit-manipulation problem (e.g., “Bitwise Toggle”).

  2. Looping Behavior: Instead of deriving a general formula immediately, the model begins verifying the solution against specific integers incrementally (e.g., checking n=67108863, then n=67108864, and so on).

  3. Termination: The generation hits the token limit.

  4. Result: The intended answer is never produced. The output contains only the incomplete reasoning loop, incorrectly formatted as a standard text response.

Evidence / Example Payload

In the specific instance below, the model consumed 15,356 tokens in thoughts. The second content part contains the infinite loop text (“Wait, let’s check…”) but is missing the thought: true flag, causing it to be interpreted as the final answer.

Snippet of the Infinite Loop (Leaked Text):

“…Wait, let’s check n = 67108863… Correct. Wait, let’s check n = 67108864… Correct. Wait, let’s check n = 134217727… Correct. Wait, let’s check n = 134217728…”

Full API Response:

{
  "response": {
    "responseId": ".....",
    "usageMetadata": {
      "totalTokenCount": 16233,
      "thoughtsTokenCount": 15356,  // <--- Proof of runaway reasoning
      "candidatesTokenCount": 640
    },
    "modelVersion": "gemini-3-flash-preview",
    "candidates": [
      {
        "content": {
          "parts": [
            {
              "text": "**Algorithm for Bitwise Toggle**\n\nOkay, here's my line of thinking...",
              "thought": true
            },
            {
              // BUG: This part is raw reasoning loop but lacks "thought": true
              "text": "33554432 ^ 33554430 = 67108862... \n\n Wait, let's check `n = 67108863`... \n\n Wait, let's check `n = 67108864`... \n\n Wait, let's check `n = 134217728`...",
              "thoughtSignature": "....."
            }
          ],
          "role": "model"
        },
        "finishReason": "MAX_TOKENS"
      }
    ]
  }
}

Could you suggest if there are any workarounds for this issue? Thank you so much.

3 Likes

We experience the same problem sometimes. At one point, it consistently happened in the second turn of an agent run. Personally, I also sometimes see it in Cursor.

2 Likes

Hi @Tommy_Asai, welcome to the community!

Apologies for the delayed response.

I tried to reproduce the issue, but the model didn’t get stuck in a verification loop.

Could you please share the exact prompt you sent and also the exact configuration set, so that we can try again?

Thank you!

1 Like

I’m experiencing the same problem with gemini-3-flash-preview through AI Studio. These loops were introduced recently and did not exist in the initial release. The prompt itself doesn’t guarantee it will get stuck in a loop tho. Can provide logs with requests/responses if needed

1 Like

Hi, @Srikanta_K_N , thanks for your response,

Could you please share the exact prompt you sent and also the exact configuration set, so that we can try again?

Here are some information that might be relevant:

Provider Settings

thinking_config = ThinkingConfig(

    thinking_level=ThinkingLevel.HIGH,

    include_thoughts=True # or False. Doesn't matter

)

generation_config = GenerateContentConfig(

    max_output_tokens=max_tokens,  # 16k or 32k

    temperature=0.0,

    thinking_config=thinking_config,

    system_instruction=system_message  # check below

)

Prompts

We were making prompts from hundreds of benchmark problems

Example Dataset: MBPP (Mostly Basic Programming Problems) - Java translation

  • Source: nuprl/MultiPL-E

  • Total Problems: 386

  • Language: Java

  • Problem Type: Logic and algorithmic problems

System message

You are a coding assistant that specializes in Java. You are given a problem in natural language and a function signature.... (continues)

Example Problem (toggle_middle_bits - MBPP_java_319)

import java.util.*;
import java.lang.reflect.*;
import org.javatuples.*;
import java.security.*;
import java.math.*;
import java.io.*;
import java.util.stream.*;
class Problem {
    // Write a javathon function to toggle bits of the number except the first and the last bit. 
    // https://www.geeksforgeeks.org/toggle-bits-number-expect-first-last-bits/
    public static long toggleMiddleBits(long n) {