Gemini 2.5 Flash gets stuck in infinite token repetition during structured JSON output (LiteLLM)

vkadlec · May 7, 2026, 5:57pm

I’m running into an issue with Gemini 2.5 Flash when using structured JSON output through LiteLLM. In some cases, the model enters a repetition loop and keeps generating duplicated tokens or JSON fragments until the max output token limit is reached. The final response is often malformed or incomplete JSON.

Has anyone seen this specifically with Gemini 2.5 Flash + LiteLLM?

endless_suffering · May 16, 2026, 8:39am

im using the same stack as well, im facing the same issue as well, i suspect this is a intermittently occurring issue since it happened before i even integrated litellm to my stack previously. it was repeating tokens/outputs until the max_token limit which resulting in a malformed JSON. tested with 2.5 flash-lite mainly as well 3.1 flash and 2.5 flash, it was having the same issue. This might be related but i was fixing the temperature=0 to ensure close reproducibility, but tweaking it higher seem to reduce this repeating token effect, though causing the output to be less accurate

Topic		Replies	Views
Structured output: repetition loop inside a JSON number literal runs to MAX_TOKENS (Flash, Vertex) Gemini API vertexai	0	51	July 17, 2026
Gemini-2.5-flash generates infinite token sequences Gemini API api , models , gemini-flash-2-5	5	415	December 16, 2025
Bug Report the model often starts creating repetitive sequences of tokens Gemini API gemini-15	15	1906	February 5, 2026
Gemini-2.5-flash repeats tokens until max-tokens reached in structured output Gemini API models , gemini-flash , gemini-25 , gemini-2-5	6	809	December 1, 2025
Gemini 2.5-flash stuck in a tool call loop when using both tools and structured output Gemini API api , gemini , gemini-flash	8	880	January 30, 2026

Gemini 2.5 Flash gets stuck in infinite token repetition during structured JSON output (LiteLLM)

Related topics