Characters with accents are generated in lots of wrong ways

I’m using Gemini 2.0 flash with structured output in various languages. If the output contains characters with accents, as is e.g. common in French, these characters are sometimes generated correctly, but often generated in lots of different ways.

I’ve seen accented e’s replaced by \[e], so no information about the accent but at least the unaccented letter. I’ve seen accented letters or descriptions wrapped in \[\] (sometimes with backslashes, sometimes without), e.g. \[è\] or \[egrave]. I’ve seen them replaced by a number of \n’s or \t’s… so no letter and no accent. I’ve seen unexpected but identifiable encodings, e.g. \u00e9, \u00f4. But I’ve also seen these encodings with a \n instead of a \u… And I’ve seen the hex numbers without anything, so simply 00e9.

I think there were more that I didn’t document…

I try my best to reconstruct what I can… but obviously this doesn’t work when letters or information about accents are missing.

Has anybody else encountered this / found a way to reduce the chance of this going wrong?

Hi @Xavier_Van_Elsacker, Welcome to forum!!

Thanks for reporting. Can you please check once with 2.5 pro model, is it the same case??
Else please share your schema and prompt if possible so i can repro the issue from my end.

Thanks