Has anyone else seen the error where Gemini Pro 06-05 will sometimes get stuck in an endless loop if you use the ‘generateContentStream’ call, sending chunk after chunk and never ending until the server is killed?
The last prompt I just ran sent 750 chunks before I killed it. This wouldn’t be a problem if the response needed them, i.e. a huge text or block of code - but it’s repeating the same thing over and over again and never sends the STOP finishReason. I’ve set a thinking budget, but I don’t think it’s related because this is the response stream that is getting stuck.
Am using the TypeScript npm package (version 1.2.0)
Hi @Richard_Davey,
I just checked in colab environment with Gemini Pro 06-05, it’s working fine with streaming responses. Here is the gist. Please share your prompt so i can repro the issue from my side and debug more.
Thanks
Hi @Govind_Keshari - sadly it’s not as simple as a single prompt. We’ve a set-up that involves an agent to agent workflow. If you’ve access to our transactions/logs I’m happy to send you our project ID privately.
Some further details:
- I updated to version 1.4.0 of the npm package.
- It doesn’t always happen. There doesn’t appear to be a pattern, yet.
- In order to debug it we took your npm package and inserted logging directly into the API calls, to ensure it wasn’t happening at the application layer. Some combination of events manages to get it stuck and chunks will flow from the endpoint and never stop - a run we had today was up to 750 chunks before we aborted the server.
- To test, we swapped from
generateContentStream
to generateContent
- and we managed to replicate the issue. It would send the same ‘thought’ responses over and over, with every so slightly varying text in them - the token count was shooting up each time. In the end, it got aborted by our internal checking code, that prevents too many responses from a single call.
The app architecture involves the user interacting with a Gemini Flash app. This will then package a prompt that is passed to another agent that calls Gemini Pro. This agent has access to a handful of tools, one of which calls Gemini Flash itself. The other tools all use local data.
At this point we can make it happen so consistently, in terms of frequency (but not what triggers it), that we’re considering removing the npm package entirely and hitting the end-point directly to see if we can narrow it down from there.
Hey @Richard_Davey, Just for confirmation can you please tell which tool you are using?? Are you using any tool provided by Gemini API like Code Execution, Function calling, Structured output or Grounding??
@Govind_Keshari just Function Calling. We don’t use any other built-in tools. In order to debug this further we ported our whole app over to LangChain so we can investigate it via LangSmith and it still happens there. For example, a typical run should use around 15k-20k tokens, but when it gets ‘stuck’ this happens:
You can see the 139k token monster there, taking 137 seconds to process! An hour before that we had one hit 210k tokens before LangChain aborted it! If you dig into the run it’s all going well until, randomly, it gets stuck on a single function call which takes up all of the time:
This is the same tool that is used multiple times above and below it in the flow, it’s just on this one invocation that it gets stuck generating content endlessly. The tool itself is really simple, just a search/replace tool - but it does invoke Flash.
I’m happy to give you access to our langsmith project if it would help!
Cheers,
Rich
Hey @Richard_Davey, Thanks for sharing the details. I will escalate this with our team.
I think I may be hitting the same issue. We are testing Gemini as a UI generator, and it uses our tools just fine to build the app/UI, but then when it’s done and all that’s left is to summarise the work it has done, I get the same phrase streamed endlessly:
The shadcn button has been successfully added to the dummy web page. The button is centered on the screen. The build was successful, so everything is working as expected. I have finished my work. I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host.
I haven’t waited to see how long this will go on if I don’t stop it, but it seems to at least want to continue for several minutes.
Yes, it’s 100% a Gemini issue. If we do nothing other than swap the model to OpenAI o3-2025-04-16 then it never happens. Swap back to Gemini Pro and lots of the time (not 100% of the time), it gets stuck streaming the same content over and over - or worse, calling the same tool over and over. Either until we kill the server, or it exhausts maxIterations in langchain.
It’s a nightmare to debug 
Very similar situation here. Our UI generator supports using Anthropic and OpenAI as well, and we don’t have these issues with them. Most of the code is shared between these three providers, but we have an adapter for each to handle API differences, so it could of course be that we’re doing something wrong in the Gemini adapter, but I have not found anything in the documentation that indicates we could do anything differently to avoid this (except prompt tuning, of course, but it feels like it shouldn’t be doing this no matter what the prompt is).
If you find a prompt that prevents it, please share here! We’ve given up after exhausting all of the more sensible attempts you could think of.
One thing we did observe: Setting the thinkingBudget
made no difference, so it’s something more fundamental / deeper in the service layer.
Not sure if you’re using Gemini Flash or Pro, but we observe it mostly with Pro.
@Govind_Keshari did the team get anywhere with this at all? 