Gemini flash 2.0 API sometimes would stop outputting (paused)

Finsheet_Mail · December 31, 2024, 12:35am

We are looking into adding gemini 2.0 for our workflow when it is productionzed in Jan 2025. I have noticed that many times, the model would stop outputting mid-stream. The stream seems to still be running because after a while it ends. But after the freeze, I don’t receive the response any more. So basically Gemini 2.0’s response is incomplete or truncated.

Does anyone else have this issue?? It happens with many types of prompts. And the problem is not consistent. Sometimes when it happens, I retry the same prompt again and everything works fine.

Caio_Jardim · December 31, 2024, 6:58am

Have you noticed what happens when the output is stopped/paused?

It’s not unusual and it has happened to me multiple times, but it should provide you with an explanation of why it has paused

rockmandash · December 31, 2024, 10:49am

This is unaccpetable, and it still charge you money!!

Finsheet_Mail · December 31, 2024, 11:50am

@Caio_Jardim there is no error at all. One thing I notice: when the output is paused/stopped, after the streaming is done, the response is still incomplete/truncated but there are a lot of blank space at the end of the response.

@rockmandash well it is free (experimental model)

rockmandash · December 31, 2024, 12:01pm

I have used the API key with TypingMind, I got billing attatched, and the api always timeout, and it still charges me money though.

yan-hic · January 3, 2025, 7:22am

Happened to us yesterday too. Maybe we hit an output token limit for experimental models, but G would probably argue this is experimental so could just break.

The latter doesn’t allow true evaluation for your existing prompts so I would wait till GA for that.

Not clear why releasing a model if G knows it may truncate output though… Maybe they should allocate more processing power.

nguadiana · January 7, 2025, 7:57am

@Finsheet_Mail
I know you mention that this happens with many prompts but:

Do you have some sample prompts with you’re using where I can check the blank space?
Are you working with long context prompts?

Finsheet_Mail · January 7, 2025, 11:32pm

Hi, this is the problematic prompt: Problematic prompt - Google Docs. I tried on AI Studio using Gemini 2.0 flash and it always works there. Not sure why it doesn’t work consistently with API.

After doing some more testing, here are the additional info that found out

It only happens when I ask for a table output (I add this to the prompt: “- IMPORTANT: provide your response in three parts: an introduction paragraph, one data table which contains the key information, and a summary paragraph.”). Without this additional sentence, the freeze never happens.
The freeze/pause always happens after the table header is sent back. The paused rendered result always looks like this
it usually only happens for long prompts (around 50k-100k tokens)
when the output is paused/stopped, after the streaming is done, the response is still incomplete/truncated but there are a lot of blank spaces at the end of the response
I printed out all the result here: Stream output - Google Docs. As you can see, near the end of 1st page, there is no more output. Gemini 2.0 just responds back with a long string of empty spaces. FYI, I am calling the REST API directly (https://generativelanguage.googleapis.com/v1beta/models) and collecting the streaming responses instead of using any libraries/packages.

Finsheet_Mail · February 6, 2025, 2:13am

Does anyone know the answer for this? I tried today with the production Gemini 2.0 Flash, but the problem is still there. The problem appears for all models in the 2.0 flash family (2.0 flash, 2.0 flash lite, 2.0 flash thinking). However, the exact same prompt works 100% for 2.0 Pro. The 1.5 flash has no problem as well. I have no idea why 2.0 flash has this problem. It is really frustrating since i really look forward to using it.

Joscelin_Gaxiola · February 7, 2025, 1:05pm

I wanted to add my own experience here too, at first I was using 2.0 experimental and i would see that many times it would just cut out and stop before completing. Since yesterday I have been using the stable 2.0 version and it has the exact same issues. when I use flash 1.5 then I do not get any issues.

Stijn_Tonk · February 8, 2025, 3:09pm

I have similar experiences. Especially when the model needs to generate Markdown tables from complex PDFs for example. It sometimes repeatedly “hangs” and give weird output at the start of text of the markdown table. (Note I am not hitting any output limits etc.)

It happens very often with the flash model (making it almost useless) and less with the pro model.

Johan_TUTUK · February 9, 2025, 7:49pm

Hello all, I’m new here… I want to help but the first of your story isn’t mention "what platform/coding language you use with your API.

Honestly I build my own Web-Apps and always get all the response from Gemini 2.0 Flash without missing even still using v1beta.

If I got your situation “freeze”, I will make the code to never freeze any response. If I “don’t receive the response” or we can say that’s “no response from Gemini”, I will make the code to repeating same ‘parts’=>‘text’ / prompts.

I still not found any Information to send any “History” to Gemini. If anyone can share where is it, I will follow the Instruction to make Gemini to be “consistent”

Johan_TUTUK · February 9, 2025, 8:03pm

I try to help if you tell me where you use your API key

Johan_TUTUK · February 10, 2025, 6:45am

After few tweak… This is not relevant… I see Gemini can response consistent inside my Chat Box

Joscelin_Gaxiola · February 10, 2025, 11:37am

Thanks for pointing this out, I usually ask the model to reply in html or markdown, I will now test this exact thing and ask it to respond in plain text to see if this could be the cause, I actually created a different thread where I left 2 responses that gemini cut out on.

Joscelin_Gaxiola · February 10, 2025, 11:46am

Ok I ran 3 tests with gemini 2.0. 2 of the times it cut out, one time it completed. please see the details below of when things cut out.

test 1: gemini 2.0 - the response cut out here:

Review existing content: Find the blog posts that you think are most relevant for those keywords and assess.
* **Improve

response numbner 2:

Identify: Look at the queries with a high number of impressions but a low CTR. This means people are seeing your content in search results but aren’t clicking on it. Examples from your data include:
- “google sheets project management template”
- “social media calendar template google sheets”
- “google sheets add ons”
- “google sheets functions”
- “google sheets templates”
- “google sheets tips”
Improve:
- **Optimize

So as you can see it seems to be cutting out when trying to add simple markdown, but each time it did cut out.

i then tried a new prompt asking it to respond in plain text only and the entire response was received

I also tried the prompt that was being cut out by gemini 2.0 with flash 1.5 and it responded in full without issues

Stijn_Tonk · February 14, 2025, 8:11am

Yeah, Markdown in combination with a difficult task seems to trigger some weird behavior. Hope that Google can fix this in future releases.

Isaac_Koczwara · March 6, 2025, 12:59am

I believe I found out why this is happening

https://discuss.ai.google.dev/t/gemini-2-0-flash-has-a-weird-bug/65119/20?u=isaac_koczwara

TLDR: set the temperature to 1.0

Finsheet_Mail · March 6, 2025, 2:21am

works consistently now, thanks so much. Hope they get this fixed soon

Topic		Replies	Views
Gemini 2.0 Flash has a weird bug Gemini API bug , gemini-20	16	2997	July 29, 2025
Gemini 2.5 Native Dialog audio problems Gemini API ai-studio , audio , gemini-flash-2-5	16	703	August 19, 2025
All Gemini goes wrong Google AI Studio feedback , bug , models	18	2200	June 23, 2025
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	47	2148	August 19, 2025
Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25) Gemini API feedback , api , gemini-25 , gemini-2-5	7	613	June 4, 2025

Gemini flash 2.0 API sometimes would stop outputting (paused)

Related topics