I’m using Gemini 2.0 flash with structured output in various languages. If the output contains characters with accents, as is e.g. common in French, these characters are sometimes generated correctly, but often generated in lots of different ways.
I’ve seen accented e’s replaced by \[e], so no information about the accent but at least the unaccented letter. I’ve seen accented letters or descriptions wrapped in \[\] (sometimes with backslashes, sometimes without), e.g. \[è\] or \[egrave]. I’ve seen them replaced by a number of \n’s or \t’s… so no letter and no accent. I’ve seen unexpected but identifiable encodings, e.g. \u00e9, \u00f4. But I’ve also seen these encodings with a \n instead of a \u… And I’ve seen the hex numbers without anything, so simply 00e9.
I think there were more that I didn’t document…
I try my best to reconstruct what I can… but obviously this doesn’t work when letters or information about accents are missing.
Has anybody else encountered this / found a way to reduce the chance of this going wrong?
Hi @Xavier_Van_Elsacker, Welcome to forum!!
Thanks for reporting. Can you please check once with 2.5 pro model, is it the same case??
Else please share your schema and prompt if possible so i can repro the issue from my end.
Thanks
Hi,
Thanks for getting back to me. I’ve tried to reproduce this using 2.5 pro using prompts that went wrong in the past (and when retrying using 2.0 flash today) as well as a random set of prompts.
For now all 2.5 pro output was OK, which seems promising. I’ll continue to test 2.0 problems.
Is the expectation that this will be fixed with 2.5 flash when it gets released as well? 2.5 pro is too slow for my use-case , so it’s not something I can switch to in general.
That’s good to here. Yeah, pro models are little heavier you can see some latency.
Yes, definitely it will resolve in 2.5 Flash. We gather the feedback to improve the things in next version.
I’ve now also tried with gemini-2.5-flash-preview-05-20. Where I did not see the strange encodings I got with 2.0, I do see the repetitions that are also mentioned elsewhere. These repetitions start when an accented character should be printed. I’ve seen “\n\t\t\t\t\t…”, “\r\n\r\n\r\n\r\n…”, “\n\n\n\n\n…” These repeat until max token count is reached.