Markdown and XML Closing Tag Issues

Gemini 2.0 flash
I tried to enclose data using code blocks in markdown or XML tags or even ``, but sometimes a closing tag is missing.

Hi @RandomGuy,

Can you clarify more on this? Can you provide an example of the error you are seeing? Is the issue occurring consistently or intermittently?

Thanks.

It’s kind of random, happening for Flash 2 or the thinking model. For example, I work on translating subtitle SRT files. I split them into 50 blocks per prompt. I provide around 30k tokens per example. Generally, it starts ignoring closing tags around 40-50k tokens, and even translates things it has already translated, merging lines of subtitles if their meanings are close together.

My solution is to not use any tags at all. Secondly, I use slicing (-20) to remove old prompts in each iteration, limiting it to a range of 30k. This works, even with some minor errors when it merges lines of subtitles. I think it doesn’t understand that \n is very important.

1 Like

Oh, one more issue: if the transcript is Chinese, Korean, or Japanese and contains the full-width u3000 space, the AI will treat it like a newline character \n, causing the subtitle block to have more lines than normal. Normal subtitles have one or two lines, but if the AI finds fullwidth space, it might break the subtitle into two, three, or four lines within a single block.

Hey @RandomGuy, Thanks for clarification, Now i understand it better.

There are three issues you are facing currently as i can see : Closing Tag Issue (Markdown/XML), Repetitive Translation & Line Merging, and Full-width Space (u3000).

Need some supporting prompt and screenshots to escalate this issue, can you help me with providing below requirements :

For the u3000 issue Could you provide a small snippet of the original Chinese/Korean/Japanese text that contains u3000 and then the corresponding output where the AI incorrectly inserted newlines? A screenshot of the problematic output would also be incredibly helpful if it clearly shows the extra lines.
For the missing closing tags If possible, a brief example of the input prompt (how you tried to enclose the data) and the malformed output.
For the line merging, an example of a few original SRT lines and how they were incorrectly merged in the output.

Thanks