Gemini 2.0 Flash has a weird bug

We keep hitting this bug, especially with Gemini-2.0-Flash.

It occurs when using it with:

  • one or more pdfs in the context
  • a difficult task to perform
  • output that involves Markdown table generation
  • temperature set to 0

In many cases it start the answer and it just fails in the first lines of table generation, sometimes outputting a lot of whitespace and some randomness.

Try it for yourself:

For me this fails every time both in Vertex and AI Studio. Giving this output:

Okay, here’s the conversion of the provided data into a single Github Flavored Markdown table, combining the various tables where it makes sense to do so. I’ve tried to maintain the original formatting and information as accurately as possible. Note that some of the original formatting (like precise column widths) is not possible in Markdown.

## Alphabet Inc. - Consolidated Financial Data (2021-2023)

| Description                                      | 2021 (Millions) | 2022 (Millions) | 2023 (Millions) 
(then a lot of white spaces)

It makes the flash model hard to use, because in our tasks, which involves PDFs and generating tables, it is hit or miss if it going to crash or not. We do notice that Gemini-2.0-Pro seems to suffer less from the issue. While it also happens for that model.

UPDATE: For the case above I found a solution to overcome the issue. Just add the following statement to the system prompt:

For tables, please use the basic GFM table syntax and do NOT include any extra whitespace or tabs for alignment.

That somehow seems to be enough for the model to then not falter during markdown table generation. I am going to test it further for our internal tasks.

2 Likes

I encounter the same bug with vercels ai sdk.

  • Complex Prompt
  • Output is supposed to be a markdown table with a lot of content
  • Response streaming get’s stuck

Simple tables work fine.

Unfortunately the system prompt addition didn’t work for me.

I must say that for us the issue is also persistent. We primarily use the pro experimental model, and we managed to get it a a bit more reliable by forcing it to “minimal” formatting of markdown tables. But, it still goes wrong too often, failing on the markdown table output. It also really hard to make the model comply to not doing a lot of table formatting (white spaces and alignment of columns etc).

We are now implementing an approach where the pro model no longer is allowed to tables at all. We then feed the output to a second model that will do the formatting, create tables etc. We will see how that works. Hopefully it becomes more stable.

It is really weird to me that not more people are experiencing this issue, because it makes the models really unreliable.

@Bernd_Holbein are you working in a different language than English? We work in Dutch (dutch documents, prompts and answers etc), maybe that also plays a role…

Yes, the main prompt is german in our case.

So @Stijn_Tonk this is interesting. When I start with a simple english prompt, the model first refuses to create a table. When asking more explicitly it creates a table with markdown format but wrapped in a codeblock. After a lot of nudging the model will eventually create a markdown table with a lot of content.

If I reactive the initial complex german prompt, it immediately fails to process the stream.

@Bernd_Holbein And when you use the English prompt, and it succeeds; did you then still let the model output a response in German or was the response now also in English?

Have you also tried other table formats, like HTML, csv, etc. We have this still on our list to try out.

@Stijn_Tonk I am also seeing this exact issue when I have the temperature set at 0. I can reproduce it consistently for a specific prompt. However, when I set the temperature at 1 it does not happen anymore. Not sure if I would call this the best solution, but curious if this also fixes your issue @Bernd_Holbein

Hi @Thimo, I have seen some cases where indeed higher T eliminates the issue, maybe due to the sampling leading the answer in a different direction. But I have cases that apparently are so though that even higher T, does not resolve the issue.

Facing the same issue here with gemini-2.0-flash! Despite adding system prompts to not include extra whitespace or tabs, the output will still spit out a few thousand spaces after the second header in the MD table. Sometimes it will just go on forever and not stop.

| Types | Characteristics <...thousands of spaces here>

This is definitely not ready for production— especially since you don’t know whether or not it will use a table (unless you explicitly tell it not to create tables in system prompts). Unfortunately Gemini 2 Pro is not available for production yet, so I’ll stick with Gemini 1.5 Pro for now.

Update:
I get the same exact issue using gemini-2.0-pro-exp-02-05. No problems with gemini-1.5-pro however.

2 Likes

Two things we are now doing to mitigate this issue (as switching back to 1.5 is no option for us) that you could try out:

  • first output the information as as plain/text; only lists and no tables. And than make a separate call to do Markdown formatting.
  • For usage of the thinking model we stepped away from Markdown to HTML output, which seems to work more reliably.

I really do not get why Google is so silent on this topic, and is not really helping us out here. The fact that their model is so good with (many) PDFs is pure gold; we have many customers that benefit from this case. If only we could use their models reliably…

1 Like

No matter what system prompt I use it will not listen to me. I think there is something deeper going on here with markdown tables. For example, adding the following two lines to the system prompt:

    - For markdown tables, ensure they are in GFM format and there is exactly only one space between each " |". Do not forget this or you will be punished.
    - For markdown tables, each header MUST follow the format like | **Header Title** |

does not affect the outputted markdown table at all. I just get something like:

| Tasks                                 | Mechanism of Action                           <...thousands of spaces>

I would at the very least expect the above to look like:

| **Tasks**             | **Mechanism of Action** |        ...

Yet it seems to follow it’s own internal format. Hopefully we can get an official response soon before more people start inevitably running into this same issue!

For reference, the above was tested using gemini-2.0-flash with a system prompt of about 16k tokens worth of instructions.

It is indeed kind of tricky to get it to listen. In our case we remind it of doing a Markdown table without extra whitespaces and alignment both in the system prompt and we append it to the user prompt. Also in the system prompt we show an example of what a table without the whitespaces looks like. But then still is sometimes messes up… :frowning:

Have you tried switching the HTML tables?

Aha I figured it out.

So here’s my theory, the reason markdown tables aren’t working well is here is because at a a lower temperature, the llm will pick the highest probable token. And with markdown tables, Google must have been trained on tables with longer headers—which would make sense because it’s always the 2nd or 3rd header that the error begins. Usually the first column is an id (so its short), but the second column is usually longer to the length of the longer text.

So what’s going on here is that when the llm predicting the next token, the most probable token to come next is going to be a space. But the problem is that the LLM doesn’t know how long the text inside the table (since it can’t predict n+2). So it gets stuck in a loop of prioritizing a space.

In my case, I was using Vercel’s AI SDK which by default sets the temperature to 0. Gemini 1.5 had a temp range from 0-1, while Gemini 2 has a temp range of 0-2. Explicitly setting the temperature to 1 solved my problem. But anything less than 1 triggers it.

It’s definitely a bug, but I doubt it’s fixable since it’s been trained on it. If you need a temperature less than 1, you’ll just have to wait until fine-tuning is available.

1 Like