Gemini-2.5-flash generates infinite token sequences

codeonym · December 4, 2025, 3:32pm

For quite some time, I’ve noticed that the Gemini 2.5 Flash series (both Flash and Flash Lite) tends to generate unusually long token sequences — sometimes reaching a maximum width, other times seemingly infinite.

LangSmith trace:

image978×620 100 KB

My parameters:

llm = ChatGoogleGenerativeAI(
    model=model.model_name,
    response_mime_type="application/json",
    thinking_budget=0
)

thinking_budget=0 for fast, straightforward replies
temperature left to default (around 0.7, I assume)

Srikanta_K_N · December 8, 2025, 6:07am

Hi @codeonym, thanks for reaching out!

Could you please let us know what you are trying to achieve?

Is this always happening, or only when you send specific prompts?

If possible, can you provide steps to reproduce so I can try on my end?

codeonym · December 8, 2025, 7:25am

Hi there @Srikanta_K_N , ok sure thing, here is a link to langsmith trace for debugging: LangSmith (I’ve included only the relevant part)

Q:

Could you please let us know what you are trying to achieve?

A:

It’s a data refinement wokrflow (MD/HTML artifact) using structured output

Q:

Is this always happening, or only when you send specific prompts?

A:

Yes, Almost all the time when refining the HTML/MD artifact.

codeonym · December 9, 2025, 7:52am

@Srikanta_K_N here is another trace for debugging (max tokens reached): LangSmith

Magnus_Kull · December 11, 2025, 7:56am

I’m also having this issue. I’ve found that 2.5 Flash starts generating infinite \n or \t when trying to generate non-english characters in a structured output. My understanding is that this was verified and reproduced, and fixed in gemini-2.5-flash-preview-09-2025. I’ve tried it and it seems to work fine.

However, the gemini-2.5-flash-preview-09-2025 does not respect thinking_budget = 0 in combination with structured output. This has also been reported by several users.

The only idea I have left on how to handle this is to add stop sequences for the most common cases, and try to rerun these requests at a later point when this is fixed.

codeonym · December 16, 2025, 5:01pm

I’ve switched to the preview model and I haven’t encountered that error yet, thanks for pointing that out

Topic		Replies	Views
Bug Report the model often starts creating repetitive sequences of tokens Gemini API gemini-15	15	1658	February 5, 2026
Latest @google/genai with 2.5 flash ignoring thinking budget Gemini API generative-ai , gemini-flash	11	539	December 2, 2025
Random Endless \n Output in Gemini API 1.5 Pro Responses Gemini API gemini-15 , model	17	1667	February 18, 2026
Gemini 2.0 Flash fails to generate a structured output Gemini API bug , api , vertexai , gemini-20	3	538	July 30, 2025
Unstable but reproducible constrained generation errors with gemini-2.5-flash-lite-preview-06-17 using very simple prompts Gemini API bug , api , gemini-flash	3	717	July 30, 2025

Gemini-2.5-flash generates infinite token sequences

Related topics