Summary:
When using the “gemini-1.5-flash” model for generating long texts, the model often starts creating repetitive sequences of tokens, leading to an infinite loop and exhausting the token limit. This issue is observed with both the Vertex and Gemini APIs.
Example: ```
“The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed…”
Steps to Reproduce:
Use the "gemini-1.5-flash" model via Vertex or Gemini API.
Generate a long text (e.g., legal or technical document).
Observe the generated output for repetition of phrases or sentences.
Expected Behavior:
The model should generate coherent and non-repetitive text.
Actual Behavior:
The model begins to repeat sequences of tokens indefinitely, leading to the maximum token limit being reached.
Impact:
Wastes tokens and API usage limits.
Generates unusable text, necessitating additional requests and costs.
Reproduction Rate:
Occurs frequently with long text generation tasks.
Workaround:
Currently, there is no known workaround to prevent this issue.
Request for Resolution:
Investigate the cause of the repetitive token generation.
Implement a fix to prevent the model from entering a repetitive loop.
Provide a mechanism for users to request refunds or credits for tokens wasted due to this bug.
2 Likes
We are experiencing the same issue.
Hi @ruggiero.guida @rossanodr,
Can you provide a prompt so that I can replicate the same?
Thanks!
Thanks @Siva_Sravana_Kumar_N
The prompt contains private and sensitive information and we are not comfortable sharing it on a public forum. Would you be able to DM me a Google work email so we can send the info there?
It’s happening in may diferent prompts.
Unfortunately, I think the problem is with Gemini. It is happening with many different prompts. The main issue is the large context. Let’s say your prompt is something like, “Read the document below and make a list of all dates of birthdays on it {list}”. If the document is large, it has a chance of starting to repeat the same date until it reaches the token limit.
Gemini Flash 1.5
prompt =
'what we understand though is that nothing has has been decided and everything is really in the sort of preliminary stages but i think you know as the market is showing us today this is kind of a healthy and a natural thing for a company in this kind of a situation to be doing well we called it plan b but it might as well be plan cde e you know things have been difficult for intel we use the word chip maker a lot for all kinds of semiconductor companies but in intel's case it's actually true you split the business in two parts they design chips and what they do and then they manufacture them for themselves the problem they've got is that they don't currently manufacture chips really for anyone else and that's the financial issue yeah it's a it's a conundrum right they they're saying look we want to be a a foundry right we want to do what tsmc does in order to do that we need more factories we need more technology but that costs a lot of money the money comes from the products that they s...'
Process the above text according to the following steps:
Step 1. Restore only the punctuation that is missing from the original text:
- Maintaining the original word order as much as possible
- Each sentence should be on a separate line.
Step 2. Translate each sentence into Chinese one by one
- Predict what type of content this is, and then translate it according to that type.
The screenshot: