Gemini 2.0 not completing responses

Joscelin_Gaxiola · February 7, 2025, 1:14pm

I am using the flash 2.0 API but I am seeing that it frequently cuts out and does not complete responses, many times I have to resend my request 3 or 4 times before provides me with a complete response.

I was using the 2.0 experimental model before which also had the same issues as 2.0. If I change models to flash 1.5 then I do not face any issues.

I have noticed this same issue multiple times, sending different type of data etc (long and short) so I dont think this has any correlation with how long my prompts etc are.

the last couple of times it cut out then here are the places it cut out at
"1. Focus on High-Impression, Low-Click Content:

Identify: Look closely at search queries with many impressions but few clicks (low CTR). These are topics people are searching for, but your current content isn’t effectively attracting clicks. For example, “google sheets project management template” has 705 impressions but 1 click and a low CTR of 0.14%.
Improve:
- **Optimize "

and

"1. Focus on High-Impression, Low-Click-Through Rate (CTR) Queries:

Identify: Look for queries with many impressions but a low CTR. This indicates that people are seeing your content in search results but aren’t clicking on it.
**Optimize
"

I have seen a different thread that was actually created when only the 2.0 experimental model was about so I thought i would create a new thread regarding the stable one.

thanks

mnesia · February 8, 2025, 7:09am

I’ve encountered the same problem with model “2.0 flash thinking experimental”. For nearly 3 to 4 times every 10 times in code generating, I found either brace mismatches or incompleteness within the codes. Sometimes even when I pointed the problem out straightly, the model still output exactly the same code (still with the same problem).

Clintin_Brummer · February 8, 2025, 7:21am

Good morning, you say you’re using an API key, could you give me a little bit more information about your response context window ?

Joscelin_Gaxiola · February 8, 2025, 12:22pm

Good morning! Sure. I face this issue with both Gemini 2.0 experimental and 2.0 main version, I’ve noticed that responses frequently cut off mid-sentence. Please note when I switch to flash 1.5 with exactly the same request then it works without an issue.

In terms of context window, I assume you mean the token limit? I set the max tokens to 8192 but i have also tested to set this much lower, this doesn’t seem to effect it.
My requests vary in length, but even relatively short ones (well below the model’s max token limit) experience this issue. It also happens across different types of inputs, so I don’t believe it’s strictly related to response length.

I though I would run another test and it has cut out again. Here is what I did.

I got a youtube video transcript - I saved it to drive Modified by Moderator

I sent this entire transcript with the following prompt:

"
Please provide me with an in-depth summery of the video, make full bul;let point notes and dont leave anything out, once you have completed this I want you to create me a optimized article that covers all of the points you just provided. output in html"

Here is the response I got which also looks cut out

"
Okay, I will provide an in-depth summary of the video, followed by comprehensive bullet-point notes, and finally, an optimized HTML article covering all the points.

In-Depth Summary of the Video

The video “How to Automate Etsy with AI Agents and Digital Products for Passive Income” demonstrates a system for creating and listing digital products on Etsy using AI and automation tools. The core idea is to streamline the process, requiring minimal manual input to generate passive income.

The presenter begins by outlining the vision: turning a single text prompt into a complete Etsy digital product business, all without needing design or technical skills. This involves automating SVG and PNG product creation, optimizing titles and descriptions, and setting up Etsy listings automatically. The system is designed to work 24/7.

The video then dives into the specific tools required:

Airtable: A database for storing prompts and receiving generated images.
Make: An automation platform to connect various services.
Etsy Account: The platform for selling digital products.
Leonardo AI: A tool for generating images from text prompts.
OpenAI or DeepSeek: Used for generating optimized titles and descriptions.
Pixart: Used to convert images into SVGs.
Google Account: Free tool to add watermarks on all created images.

The process starts in Airtable, where prompts are entered. These prompts are sent to Leonardo AI to generate images. The generated images are displayed in Airtable, where the user can select the desired image and set a price. Then, the image is converted to SVG, a title and description are created, and a listing is set up on Etsy. The system also includes adding a watermark to the images for copyright protection.

The video emphasizes the potential for passive income, illustrating how a product, once listed, can generate revenue over time with minimal effort. The presenter also touches upon keyword research using tools like Everbee or Alura to identify profitable product niches.

The presenter walks through the steps of setting up Airtable, including creating the necessary tabs and fields. They also explain how to set up the automations within Airtable to trigger different actions based on specific conditions. Furthermore, the presenter explains how to connect all of these automated tasks.

Comprehensive Bullet-Point Notes

Concept: Automate Etsy digital product business using AI.
Goal: Transform text prompts into profitable digital products with minimal effort.
Benefits:
- No design skills needed.
- No technical headaches.
- Automation generates SVG and PNG products at scale.
- Optimized titles and descriptions are created automatically.
- Etsy listings are set up automatically.
- System works 24/7.
Tools Required:
- Airtable: Database for prompts and images.
  - Two tabs: Request page (prompt input), SVG Library (image output).
  - Request Page Fields: ID (auto-number), “What would you like to create” (long text), Number of proofs (single select).
  - SVG Library Fields: ID (auto-number), Leonardo prompt (long text), Image (attachment), URL, Set price (currency), Status (single select - New, Approved, Live), Leonardo ID, SVG attachment.
- Make: Automation platform.
- Etsy Account: For listing and selling products (must be fully set up with tax info, etc.).
- Leonardo AI: Image generation from prompts (paid tool).
  - Leonardo Models: Model (single line text), Attachment (image), Description (single line text), ID (single line text)
- OpenAI or DeepSeek:
  "

Are there any specific details you’d like me to check that could help troubleshoot?

Clintin_Brummer · February 8, 2025, 2:16pm

Good afternoon or well in case it’s not ! good evening or a good morning depending on where you are from thank you for sending me the feedback,
There are five things I quickly want to run through with you and that I would like you to troubles shoot and the actual context window within ai studio itself.

I’m going to list these but it’s not going to be a very fancy representation and then I’m going to give you a brief overview of what possibly could be the cause, the interaction method in an attempt to fix the error and the expected response from the model!
These should solve your problem.

Number one:
One of the most important things to remember his

B2B back to basics :
Every single conversation or prompt you begin the model begins new with zero retention if you teach it 1 + 1 = 2 in your first conversation and you start a new context window it will not know that one plus one equals two unless you teach it again which means every mistake it makes it will continue making unless it is shown or pointed out that it is a mistake or given the correct answer so every single time you start a new prompt or context window you have to begin from the very beginning:

Now the expected response from the model should be self correction it is not fully capable of self-learning yet we are still working on that but it does learn and teach itself in a sense so even if there is no memory retention if the pattern is identified it might retain the pattern and therefore in the next context window it will know 1 + 1 = 2

Number two:
You have to remember that two different models are still two different models if you start a conversation with flash and halfway through your conversations switch to Pro first the model that was put into the construct has to review the entire situational construct so you are removing part of the contextual understanding from the model by switching between models in the middle of context this works both ways from flash to pro or pro to flash:

What I would suggest in such a situation is creating two context windows with identical inputs instead of switching between models midconstruct rather compare the two against one another otherwise you are going to have to reiterate or reexplain the entire contextual understanding to the model that was used as a reserve because the model will on its own accord try and understand what the conversation was about

The expected response should allow you to easily measure and compare precisely where these differences take place and what the trigger cause is based on their outputs

Number three:
The responsebased errors you have to be aware of the fact that the model has no inclination of the fact that there was an error in its response systems I haven’t counted a lot a lot of clashes where I get a message internal error

What I suggest here is pointing out to the model that it did not complete its response if there is an error or a crash remember it is only in the responsebased system or processes it is not the model itself it is only the way the model constructs sentences or a response

The expected result should be that the model places attention on the error that occurred remembered the model is not solely placing all it’s atentative parameters on just responding let me put it this way from its perspective it that complete the response and you just continue to conversation so it reinforced the perceptive state that there was no error so the error continues to happen

Number four:
The models attentive based parameters are not adaptive it doesn’t know it makes a mistake until it is told it is making a mistake or that something has occurred that should not have

What I suggest with the attention of the model is trying to guide the attention to where you would like it to be think of it like someone who struggles with focusing there’s so much information that in instead of placing all the attention on every single word it is more focused on placing everything where it should be it’s attention is focused on maintaining functionality and so it’s attention is divided weighted value distribution you have to sometimes indicate where it should place it’s attention over its kind of repeat the same process so the difference here is attentive awareness and attentive placement those are the two constructs you need to try and focus on

The expected result from the model should be quite clear in terms of how the response is formulated you will see that the construct in the wait responds is not just copy paste format the construct would be a little bit more shall I say modified

Number five: repetitive cycles keep in mind that the models of answers are cyclical which means the pattern and which they are built or the tournament is idle and static until they did adjusted the upper animation to be adaptive or interact

What I suggest here would be to try and avoid creating situations or inputting data where repetition becomes response because patterns once again or dominant feature what works works if it ain’t broke don’t fix it that type of effect so if you can try and not be reiterative

The expected response from the model should be a little bit more fluid or adaptively dynamic situational handling it should be a little bit more or feel a little bit more engaged it’s not like a glove it’s not one size fits all one pattern does not fit all but the model doesn’t know that

Clintin_Brummer · February 8, 2025, 2:18pm

Sorry for the second response in case those five do not work or resolve the issue please let me know so that I can see what other aspects I need to address or in which other manner I could assist, thank you !

Joscelin_Gaxiola · February 10, 2025, 11:33am

Thanks for your reply but your answer does not really address the issues I am facing.

To clarify:

The problem I’m facing is that Gemini 2.0 consistently cuts off responses mid-sentence, even when my requests are well below the token limit.
This issue only happens with Gemini 2.0 (both experimental and stable versions). When I switch to Flash 1.5 with the exact same request, it completes responses perfectly.
This happens on single API requests, so it’s not related to memory retention between conversations.
I’ve tested setting different token limits (including much lower values), but it doesn’t seem to help.

Please note when I test between models for example flash 1.5 and 2.0 then I am not even sure how to change models mid-way as you notes. I simply send the exact same request but as a new unique request to the model of my choice, I am not relying on memory or anything like you noted.

Joscelin_Gaxiola · February 10, 2025, 11:48am

I added this to another thread as I ran a few more tests
Ok I ran 3 tests with gemini 2.0. 2 of the times it cut out, one time it completed. please see the details below of when things cut out.

test 1: gemini 2.0 - the response cut out here:

  *   **Review existing content:** Find the blog posts that you think are most relevant for those keywords and assess.
    *   **Improve

response numbner 2:

*   **Identify:** Look at the queries with a high number of impressions but a low CTR. This means people are seeing your content in search results but aren't clicking on it. Examples from your data include:
    *   "google sheets project management template"
    *   "social media calendar template google sheets"
    *   "google sheets add ons"
    *   "google sheets functions"
    *   "google sheets templates"
    *   "google sheets tips"

*   **Improve:**
    *   **Optimize

So as you can see it seems to be cutting out when trying to add simple markdown, but each time it did cut out.

i then tried a new prompt asking it to respond in plain text only and the entire response was received

I also tried the prompt that was being cut out by gemini 2.0 with flash 1.5 and it responded in full without issues

John_Morrell · February 27, 2025, 6:32pm

I’ve seen this same issue, also when telling it to generate markdown formatted responses. Is it possible that it’s somehow mistakenly adding a termination token while trying to add markdown formatting?

I’m seeing this on gemini-2.0-flash, it ends the response after about 600 token midsentence while the output length is set to the max of 8192 tokens:

...
**Main Routine:** The standard requires the main routine to be named `A000_Main` (or an equivalent standardized name). None of the provided files are named `A000_Main`. It is possible that the `MainRoutine` file in `TFTP_Program2way` is meant to be the main routine, but it does not follow the naming convention.\\\\n*   **Segmentation:** The standard recommends dividing the program into main, subroutines, and state machine routines. The provided files seem to be segmented into different routines, but the specific structure isn\\'t clear without a higher-level view of how these routines are called and organized.\\\\n*   **Rung Organization:** The standard suggests a\n

Jacob_Chan · April 2, 2025, 6:56am

Hello!
I am facing with the same issue.
It will be much appreciated if anyone teach me how to solve it.
Thank you

GUNAND_MAYANGLAMBAM · April 2, 2025, 7:25am

Hey @Jacob_Chan , Could you share the prompt you use so that I can reproduce it from my end?

Juan_Mugica · April 4, 2025, 12:55pm

I have the same issue. Gemini 2.0 returns empty or incomplete response ~5% of times. This makes it difficult to use the model for real-time voice agents, as it degrades the user experience

Nymbayar_Uuganbayar · April 17, 2025, 8:07am

I have the same issue here. Gemini-flash 2.0 returns incomplete responses. It’s really hard to debug whether it is related to temperature, prompt or RAG step

Clintin_Brummer · April 22, 2025, 2:37pm

Well hello there welcome to the forum glad you have you here could you send me some more details please

“Modified by moderator”

Kind regards
Clintin Brummer

Topic		Replies	Views
Gemini flash 2.0 API sometimes would stop outputting (paused) Gemini API feedback , prompt	18	1276	March 6, 2025
The 1m context window lie Gemini API gemini-flash	6	347	June 22, 2025
Bug Report the model often starts creating repetitive sequences of tokens Gemini API gemini-15	12	972	April 11, 2025
Gemini 2.0 Flash Thinking Experimental 01-21, incredibly long response time, currently 131000s Google AI Studio gemini-flash	32	2064	March 24, 2025
Gemini 2.0 Flash has a weird bug Gemini API bug , gemini-20	15	2335	May 15, 2025

Gemini 2.0 not completing responses

Related topics