Bug Report the model often starts creating repetitive sequences of tokens

rossanodr · June 26, 2024, 7:56pm

Summary:
When using the “gemini-1.5-flash” model for generating long texts, the model often starts creating repetitive sequences of tokens, leading to an infinite loop and exhausting the token limit. This issue is observed with both the Vertex and Gemini APIs.

Example: ```
“The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed…”


Steps to Reproduce:

Use the "gemini-1.5-flash" model via Vertex or Gemini API.
Generate a long text (e.g., legal or technical document).
Observe the generated output for repetition of phrases or sentences.
Expected Behavior:
The model should generate coherent and non-repetitive text.

Actual Behavior:
The model begins to repeat sequences of tokens indefinitely, leading to the maximum token limit being reached.

Impact:

Wastes tokens and API usage limits.
Generates unusable text, necessitating additional requests and costs.
Reproduction Rate:
Occurs frequently with long text generation tasks.

Workaround:
Currently, there is no known workaround to prevent this issue.

Request for Resolution:

Investigate the cause of the repetitive token generation.
Implement a fix to prevent the model from entering a repetitive loop.
Provide a mechanism for users to request refunds or credits for tokens wasted due to this bug.

ruggiero.guida · July 4, 2024, 12:14pm

We are experiencing the same issue.

Siva_Sravana_Kumar_N · July 8, 2024, 6:59pm

Hi @ruggiero.guida @rossanodr,

Can you provide a prompt so that I can replicate the same?

Thanks!

ruggiero.guida · July 9, 2024, 11:16am

Thanks @Siva_Sravana_Kumar_N

The prompt contains private and sensitive information and we are not comfortable sharing it on a public forum. Would you be able to DM me a Google work email so we can send the info there?

rossanodr · July 10, 2024, 6:08pm

It’s happening in may diferent prompts.
Unfortunately, I think the problem is with Gemini. It is happening with many different prompts. The main issue is the large context. Let’s say your prompt is something like, “Read the document below and make a list of all dates of birthdays on it {list}”. If the document is large, it has a chance of starting to repeat the same date until it reaches the token limit.

XiaoLong_Zhang · September 4, 2024, 10:22pm

Gemini Flash 1.5
prompt =

'what we understand though is that nothing has has been decided and everything is really in the sort of preliminary stages but i think you know as the market is showing us today this is kind of a healthy and a natural thing for a company in this kind of a situation to be doing well we called it plan b but it might as well be plan cde e you know things have been difficult for intel we use the word chip maker a lot for all kinds of semiconductor companies but in intel's case it's actually true you split the business in two parts they design chips and what they do and then they manufacture them for themselves the problem they've got is that they don't currently manufacture chips really for anyone else and that's the financial issue yeah it's a it's a conundrum right they they're saying look we want to be a a foundry right we want to do what tsmc does in order to do that we need more factories we need more technology but that costs a lot of money the money comes from the products that they s...'
Process the above text according to the following steps:
Step 1. Restore only the punctuation that is missing from the original text:
        - Maintaining the original word order as much as possible
        - Each sentence should be on a separate line.
Step 2. Translate each sentence into Chinese one by one
        - Predict what type of content this is, and then translate it according to that type.

The screenshot:

Bastian_Machek · October 21, 2024, 8:34am

We’re also affected by this. Happens only with flash not pro.

In our case the task is: “Describe the image contents, including all recognized objects.”
The system instruction is: “You are classifying images for photo management. Be very specific an detailed. Do not return more than 25 unique keywords.”

There’s also a given response structure:
generationConfig = {
response_mime_type = “application/json”,
response_schema = {
properties = {
ImageCaption = {
type = “STRING”
},
ImageTitle = {
type = “STRING”
},
keywords = {
properties = {
[“Aktivitäten”] = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Fahrzeuge = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Firmen = {
items = {
type = “STRING”
},
type = “ARRAY”
},
[“Gebäude”] = {
items = {
type = “STRING”
},
type = “ARRAY”
},
[“Gegenstände”] = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Menschen = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Ort = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Pflanzen = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Stimmungen = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Szene = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Texte = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Tiere = {
items = {
type = “STRING”
},
type = “ARRAY”
},
Wetterbedingungen = {
items = {
type = “STRING”
},
type = “ARRAY”
}
},
type = “OBJECT”
}
},
type = “OBJECT”
}
},

Don’t get confused by the german. This is just an example.

Any ideas, how this could be fixed.

As said it doesn’t happen with Pro, but for some users Pro isn’t an option, they want to use flash.

Varshney_Kashish · November 11, 2024, 11:34am

can anyone tell this behavior continues with gemini-flask-002 or not. Please reply

Felixvor · February 6, 2025, 8:26am

We had this happen in a few-shot document classification/entity extraction use case. We were able to fix it by downgrading google-cloud-aiplatform:

pip install "google-cloud-aiplatform==1.69.0" --force-reinstall

We can reproduce the error reliably in sandboxed environments by switching the package versions back and forth - from looping text output until token limit is reached (latest version) to straight forward correct output (version 1.69.0). Sadly, the few shot documents used to create this problem are proprietary and I can’t share them. I really can’t imagine what the package is doing to influence the model outputs this badly, but thats what we’re working with…

Maybe this solves the issue for anyone else who has this problem!

Jan_oliver · March 11, 2025, 9:26pm

This is still happening -we are experiencing in Flash 2.0 - any feedback from the Gemini team so far?

Vyaas_Srinivasan · March 25, 2025, 10:58am

We are also facing this issue with the 2.0 Flash model. Is there any way to prevent this from happening?

codeofdutypm1 · April 7, 2025, 5:51am

Have the same problem for Gemini 2.0 Flash, are there any updates on this? @Felixvor , I am experiencing this problem with javascript api, do you know how can I choose a proper package version like you did via pip?

rossanodr · April 11, 2025, 6:30pm

same problem here. I’ve tried many options of temperature etc

NonAIGuy · June 30, 2025, 5:08pm

same here

....biraz yaxşı qurduğu qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum qurlum....(max tokens exceeded) stop

like it spams it infinite go sdk, temperature is 0.8. it happens 20-30% of the time.

really unreliable and annoying.
anyone able to apply workarounds?

Firley_Company · July 13, 2025, 2:25pm

Yep, over a year and this issue is still persistent in gemini 2.5 flash and flash lite from my testing. And it’s like a 50% chance it will do this.

Topic		Replies	Views
Random Endless \n Output in Gemini API 1.5 Pro Responses Gemini API gemini-15 , model	15	814	July 17, 2025
Gemini flash 2.0 API sometimes would stop outputting (paused) Gemini API feedback , prompt	18	1373	March 6, 2025
Repetition Loops and "An internal error has occurred." Google AI Studio ai-studio , api , models	7	263	April 2, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	12	1073	July 18, 2025
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	39	1515	July 26, 2025

Bug Report the model often starts creating repetitive sequences of tokens

Related topics