Gemini-2.0-flash-thinking-exp-01-21 and Gemini-2.0-flash are not working as expected. The responses either fall into a repetition loop or produce an error message stating, "An internal error has occurred.
Welcome to the forum.
Actually, both models frequently work as expected. Can you show the prompt that is triggering the unwanted behavior? Make sure to remove any sensitive information from the prompt.
I frequently use AI Studio and the API with these specific models to process PDFs. Lately, I’ve been getting unstable responses from a pipeline I’ve used for a while, along with frequent 503 UNAVAILABLE errors, even though I haven’t exceeded the API limits. To verify, I uploaded sample PDFs directly to AI Studio using the simple prompt “Extract entity names and avoid repetition.” I turned off all safety settings, tried different temperatures and various PDFs, then re-ran the tests on both models. Below are the results from couple of minutes ago: from Gemini-2.0-flash-thinking-exp-01-21, and experienced the same problem with gemini-2.0-flash. I can share more if needed.
this has started to happen to me for gemini 2.5 pro
I tried this prompt “Extract and list unique entity names from the document”. It uses positive prompting (telling the model what it should do), which is considered more effective than negative prompting (telling the model what to avoid doing). The document I gave it is
Gemini 2.0 Flash Thinking Experimental 01-21 response
Here are the unique entity names from the document:
- European Commission
- US (United States)
- European Union
- Trump Administration
- DG TRADE
- Member States
- Europe Direct
- Olof Gill
- Ana Apse-Paese
The thoughts log shows the model followed a two-step plan, it first reads the document and identifies organization names, product names, and locations.
“Let’s start reading the text and identify the entities.”
In the second step, it says
“Let’s review the list and remove duplicates and non-entity names.”
And then it gets to the final answer part, where it says “All of these seem to be valid entities. I will list them as requested.”
I recommend using the positive prompting approach. Hope that helps.
Positive prompting was my initial approach and I tried negative prompting after observing the problem. Please try with longer documents (more than 60k tokens). Here is the output of the latest run(Gemini-2.0-flash-thinking-exp-01-21) with this exact prompt “Extract and list unique entity names from the document” and with a document ~65k tokens:
and (Gemini-2.0-flash):
I tried the Oil2024 IEA report, same positive prompt. It generated a long list of companies, countries, organizations and people and stopped, with a dangling bullet point. Then used “continue” and it generated more oil companies. It all looks reasonable.
The IEA document is 40k tokens. It’s probably some feature in the document class that you’re using that is triggering the behavior.
I have an idea, perhaps you can try subdividing your document into parts and see if that helps pinpoint where the problem happens. De-duplicating does get harder for a model when the list of things gets longer, at some point traditional approaches like a relational database with an index will clearly outperform a large language model.
Thanks you for your suggestion. I will certainly keep experimenting as I have been doing. (But pinpointing the problem in the document assumes the issue lies within the document itself, I’m not entirely sure, since these issues have occurred across multiple very different documents and tasks.)
Please also note that the prompt I shared is a simplified example intended to recreate the issue I observed. As I mentioned previously, I’ve been working frequently with long PDFs and contexts using Gemini models for various tasks, and until recently, I had not encountered this behavior. However, during my latest tests on several lengthy sample PDFs (not just one), I observed these repetition loop issues accompanied frequently by errors.
I don’t expect perfect output—duplication is not a problem—but Gemini models offer a significant advantage with their large context window and encountering these repetition loops, which can run for a very long time without producing a proper response, and errors when handling lengthy inputs(Not only with thinking models, also with Gemini-2.0-flash) disrupts the processing pipelines I’ve been using.