Using Pro 06-06 am getting endless STREAM_CHUNK sent

Richard_Davey · June 9, 2025, 11:49pm

Has anyone else seen the error where Gemini Pro 06-05 will sometimes get stuck in an endless loop if you use the ‘generateContentStream’ call, sending chunk after chunk and never ending until the server is killed?

The last prompt I just ran sent 750 chunks before I killed it. This wouldn’t be a problem if the response needed them, i.e. a huge text or block of code - but it’s repeating the same thing over and over again and never sends the STOP finishReason. I’ve set a thinking budget, but I don’t think it’s related because this is the response stream that is getting stuck.

Am using the TypeScript npm package (version 1.2.0)

Govind_Keshari · June 10, 2025, 6:08am

Hi @Richard_Davey,

I just checked in colab environment with Gemini Pro 06-05, it’s working fine with streaming responses. Here is the gist. Please share your prompt so i can repro the issue from my side and debug more.

Thanks

Richard_Davey · June 10, 2025, 7:45pm

Hi @Govind_Keshari - sadly it’s not as simple as a single prompt. We’ve a set-up that involves an agent to agent workflow. If you’ve access to our transactions/logs I’m happy to send you our project ID privately.

Some further details:

I updated to version 1.4.0 of the npm package.
It doesn’t always happen. There doesn’t appear to be a pattern, yet.
In order to debug it we took your npm package and inserted logging directly into the API calls, to ensure it wasn’t happening at the application layer. Some combination of events manages to get it stuck and chunks will flow from the endpoint and never stop - a run we had today was up to 750 chunks before we aborted the server.
To test, we swapped from generateContentStream to generateContent - and we managed to replicate the issue. It would send the same ‘thought’ responses over and over, with every so slightly varying text in them - the token count was shooting up each time. In the end, it got aborted by our internal checking code, that prevents too many responses from a single call.

The app architecture involves the user interacting with a Gemini Flash app. This will then package a prompt that is passed to another agent that calls Gemini Pro. This agent has access to a handful of tools, one of which calls Gemini Flash itself. The other tools all use local data.

At this point we can make it happen so consistently, in terms of frequency (but not what triggers it), that we’re considering removing the npm package entirely and hitting the end-point directly to see if we can narrow it down from there.

Govind_Keshari · June 11, 2025, 6:58am

Hey @Richard_Davey, Just for confirmation can you please tell which tool you are using?? Are you using any tool provided by Gemini API like Code Execution, Function calling, Structured output or Grounding??

Richard_Davey · June 11, 2025, 9:44am

@Govind_Keshari just Function Calling. We don’t use any other built-in tools. In order to debug this further we ported our whole app over to LangChain so we can investigate it via LangSmith and it still happens there. For example, a typical run should use around 15k-20k tokens, but when it gets ‘stuck’ this happens:

You can see the 139k token monster there, taking 137 seconds to process! An hour before that we had one hit 210k tokens before LangChain aborted it! If you dig into the run it’s all going well until, randomly, it gets stuck on a single function call which takes up all of the time:

This is the same tool that is used multiple times above and below it in the flow, it’s just on this one invocation that it gets stuck generating content endlessly. The tool itself is really simple, just a search/replace tool - but it does invoke Flash.

I’m happy to give you access to our langsmith project if it would help!

Cheers,

Rich

Govind_Keshari · June 12, 2025, 9:35am

Hey @Richard_Davey, Thanks for sharing the details. I will escalate this with our team.

sgrytoyr · June 17, 2025, 12:44pm

I think I may be hitting the same issue. We are testing Gemini as a UI generator, and it uses our tools just fine to build the app/UI, but then when it’s done and all that’s left is to summarise the work it has done, I get the same phrase streamed endlessly:

The shadcn button has been successfully added to the dummy web page. The button is centered on the screen. The build was successful, so everything is working as expected. I have finished my work. I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host. The build was successful. I have finished my work and the result is available at the following link: (link to localhost) I will now hand you over to the host.

I haven’t waited to see how long this will go on if I don’t stop it, but it seems to at least want to continue for several minutes.

Richard_Davey · June 17, 2025, 1:07pm

Yes, it’s 100% a Gemini issue. If we do nothing other than swap the model to OpenAI o3-2025-04-16 then it never happens. Swap back to Gemini Pro and lots of the time (not 100% of the time), it gets stuck streaming the same content over and over - or worse, calling the same tool over and over. Either until we kill the server, or it exhausts maxIterations in langchain.

It’s a nightmare to debug

sgrytoyr · June 17, 2025, 2:25pm

Very similar situation here. Our UI generator supports using Anthropic and OpenAI as well, and we don’t have these issues with them. Most of the code is shared between these three providers, but we have an adapter for each to handle API differences, so it could of course be that we’re doing something wrong in the Gemini adapter, but I have not found anything in the documentation that indicates we could do anything differently to avoid this (except prompt tuning, of course, but it feels like it shouldn’t be doing this no matter what the prompt is).

Richard_Davey · June 17, 2025, 2:41pm

If you find a prompt that prevents it, please share here! We’ve given up after exhausting all of the more sensible attempts you could think of.

One thing we did observe: Setting the thinkingBudget made no difference, so it’s something more fundamental / deeper in the service layer.

Not sure if you’re using Gemini Flash or Pro, but we observe it mostly with Pro.

@Govind_Keshari did the team get anywhere with this at all?

Govind_Keshari · June 20, 2025, 6:30am

I am following on this. Spare me sometime, i will get back to you on this.

Thanks

Yug_Sinha · July 15, 2025, 6:46am

Hey @Govind_Keshari, even I am experiencing this, i am using python sdk and for a certain kind of prompts i start receiving endless empty stream chunks until and unless i close the connection manually

Topic		Replies	Views
Gemini flash 2.0 API sometimes would stop outputting (paused) Gemini API feedback , prompt	18	1373	March 6, 2025
Gemini 2.5 Pro with empty response.text Gemini API gemini-20	55	3427	July 18, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	12	1077	July 18, 2025
Cannot or unable to end or close readable stream (Likely a Gemini API issue) Gemini API gemini-15 , api	2	187	July 14, 2025
Random Endless \n Output in Gemini API 1.5 Pro Responses Gemini API gemini-15 , model	15	814	July 17, 2025

Using Pro 06-06 am getting endless STREAM_CHUNK sent

Related topics