We are looking into adding gemini 2.0 for our workflow when it is productionzed in Jan 2025. I have noticed that many times, the model would stop outputting mid-stream. The stream seems to still be running because after a while it ends. But after the freeze, I don’t receive the response any more. So basically Gemini 2.0’s response is incomplete or truncated.
Does anyone else have this issue?? It happens with many types of prompts. And the problem is not consistent. Sometimes when it happens, I retry the same prompt again and everything works fine.
Have you noticed what happens when the output is stopped/paused?
It’s not unusual and it has happened to me multiple times, but it should provide you with an explanation of why it has paused
This is unaccpetable, and it still charge you money!!
@Caio_Jardim there is no error at all. One thing I notice: when the output is paused/stopped, after the streaming is done, the response is still incomplete/truncated but there are a lot of blank space at the end of the response.
@rockmandash well it is free (experimental model)
I have used the API key with TypingMind, I got billing attatched, and the api always timeout, and it still charges me money though.
Happened to us yesterday too. Maybe we hit an output token limit for experimental models, but G would probably argue this is experimental so could just break.
The latter doesn’t allow true evaluation for your existing prompts so I would wait till GA for that.
Not clear why releasing a model if G knows it may truncate output though… Maybe they should allocate more processing power.
@Finsheet_Mail
I know you mention that this happens with many prompts but:
- Do you have some sample prompts with you’re using where I can check the blank space?
- Are you working with long context prompts?
Hi, this is the problematic prompt: Problematic prompt - Google Docs. I tried on AI Studio using Gemini 2.0 flash and it always works there. Not sure why it doesn’t work consistently with API.
After doing some more testing, here are the additional info that found out
- It only happens when I ask for a table output (I add this to the prompt: “- IMPORTANT: provide your response in three parts: an introduction paragraph, one data table which contains the key information, and a summary paragraph.”). Without this additional sentence, the freeze never happens.
- The freeze/pause always happens after the table header is sent back. The paused rendered result always looks like this

- it usually only happens for long prompts (around 50k-100k tokens)
- when the output is paused/stopped, after the streaming is done, the response is still incomplete/truncated but there are a lot of blank spaces at the end of the response
- I printed out all the result here: Stream output - Google Docs. As you can see, near the end of 1st page, there is no more output. Gemini 2.0 just responds back with a long string of empty spaces. FYI, I am calling the REST API directly (https://generativelanguage.googleapis.com/v1beta/models) and collecting the streaming responses instead of using any libraries/packages.
Does anyone know the answer for this? I tried today with the production Gemini 2.0 Flash, but the problem is still there. The problem appears for all models in the 2.0 flash family (2.0 flash, 2.0 flash lite, 2.0 flash thinking). However, the exact same prompt works 100% for 2.0 Pro. The 1.5 flash has no problem as well. I have no idea why 2.0 flash has this problem. It is really frustrating since i really look forward to using it.
I wanted to add my own experience here too, at first I was using 2.0 experimental and i would see that many times it would just cut out and stop before completing. Since yesterday I have been using the stable 2.0 version and it has the exact same issues. when I use flash 1.5 then I do not get any issues.
I have similar experiences. Especially when the model needs to generate Markdown tables from complex PDFs for example. It sometimes repeatedly “hangs” and give weird output at the start of text of the markdown table. (Note I am not hitting any output limits etc.)
It happens very often with the flash model (making it almost useless) and less with the pro model.
Hello all, I’m new here… I want to help but the first of your story isn’t mention "what platform/coding language you use with your API.
Honestly I build my own Web-Apps and always get all the response from Gemini 2.0 Flash without missing even still using v1beta.
If I got your situation “freeze”, I will make the code to never freeze any response. If I “don’t receive the response” or we can say that’s “no response from Gemini”, I will make the code to repeating same ‘parts’=>‘text’ / prompts.
I still not found any Information to send any “History” to Gemini. If anyone can share where is it, I will follow the Instruction to make Gemini to be “consistent” 
I try to help if you tell me where you use your API key 