Gemini API - Special Character Problem

,

Hello,

I have an agent that performs real-time analysis of videos uploaded to specific YouTube channels. I send the video link and some instructions along with the prompt to the following endpoint (Gemini API):

https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent

It had been working without issues for a long time. However, I started getting an error like the one below with Gemini 2.5 Pro. I then switched to Gemini 3 Pro thinking it was a model-related issue, but the error persists. From what I understand, there’s a problem related to Turkish characters and UTF-8.

This system runs on n8n. And the problem is this doesn’t happen with all requests. Some of them returns an unreadable text. I would really appreciate it if you could help.

The Prompt that I use:

**return {
contents: [
{
role: ā€œuserā€,
parts: [
{
fileData: {
mimeType: ā€œvideo/*ā€,
fileUri: $input.first().json.videoUrl
}
},
{
text: `This is a long prompt so I removed it`
}
]
}
],
generationConfig: {
responseMimeType: ā€œtext/plainā€
}
};

This is what I get from Gemini (http request):**

AJet鈄檌n yurt d谋艧谋 učŽ½u艧lar谋 ičŽ½in gēœ‰ncellenen bagaj prosedēœ‰rlerini, limitlerini ve bilet segmentlerine gꞚre (Basic, EcoJet, Flex, Premium) de臒i艧en bagaj haklar谋n谋 yolculara net bir 艧ekilde Ꞛ臒retmek.\n\n**RTB:**\nKēœ‰čŽ½ēœ‰k čŽ½anta, kabin bagaj谋 ve kay谋tl谋 bagaj ičŽ½in sunulan kesin ꞚlčŽ½ēœ‰ler (cm), a臒谋rl谋k limitleri (kg) ve bilet paketlerinin kapsam谋n谋 gꞚsteren detayl谋 kar艧谋la艧t谋rma tablosu.

1 Like

The issue seems to be UTF-8 encoding of Turkish characters. Ensuring all text is properly UTF-8 encoded should fix the unreadable text errors.

Yes that’s the problem but I am trying to understand what triggers this issue after working for a year without a problem.

Http request asks for a raw json format. Changing this into json may solve the issue. Just thinking aloud.

I tried many things and settings and parameters and I think this is a Gemini endpoint bug.

Problem Description

The API sometimes returns Turkish text with character encoding corruption, where UTF-8 Turkish characters are incorrectly encoded as Chinese characters. This appears to be a server-side encoding issue affecting approximately 30-40% of requests.

I believe some of the servers are generating corrupted characters since there are several servers dealing with my http request. Some of them works fine, some of them not.

Affected Endpoint

POST https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent

Character Corruption Examples

Expected (UTF-8) Received (Corrupted) Hex Bytes (Expected) Hex Bytes (Received)
ı (Turkish i) 谋 (Chinese) c4 b1 e8 b0 8b
ş (Turkish s) 艧 (Chinese) c5 9f e8 89 a7
ü (Turkish u) ēœ‰ (Chinese) c3 bc Unknown
Ƨ (Turkish c) čŽ½ (Chinese) c3 a7 Unknown
ğ (Turkish g) 臒 (Chinese) c4 9f Unknown
ƶ (Turkish o) Ꞛ (Chinese) c3 b6 Unknown

Request Characteristics

  • Input Language: Turkish (UTF-8 encoded prompt)

  • Input Type: Video analysis with text prompt

  • Response Language: Turkish (expected)

  • Frequency: ~30-40% of requests return corrupted encoding

  • Pattern: Appears to be load-balancer or regional server dependent

Example 1: Corrupted Response

{
ā€œcandidatesā€: [
{
ā€œcontentā€: {
ā€œpartsā€: [
{
ā€œtextā€: ā€œStratejik Mesaj:\nAJet鈄檌n yurt d谋艧谋 učŽ½u艧lar谋 ičŽ½in gēœ‰ncellenen bagaj prosedēœ‰rleriniā€¦ā€
}
]
}
}
]
}

Decoded Text (corrupted):

ā€œAJet’in yurt d谋艧谋 učŽ½u艧lar谋 ičŽ½in gēœ‰ncellenen bagaj prosedēœ‰rlerini..ā€

Expected Text:

ā€œAJet’in yurt dışı uƧuşları iƧin güncellenen bagaj prosedürleriniā€¦ā€

Encoding Corruption Pattern

The corruption suggests the response is being:

  1. Generated correctly as UTF-8

  2. Incorrectly interpreted/transcoded through Big5 or GB2312 encoding

  3. Re-encoded as UTF-8 with Chinese character mappings

Byte-Level Evidence

Turkish ā€œdÄ±ÅŸÄ±ā€ should be: 64 c4 b1 c5 9f c4 b1 (UTF-8)

Instead received: 64 e8 b0 8b e8 89 a7 e8 b0 8b (UTF-8 bytes representing Chinese chars)

This indicates a double-encoding issue or incorrect charset interpretation on certain backend servers.

Impact

  • Severity: High - Makes Turkish language responses unusable

  • Workaround: Client-side character mapping (inefficient and fragile)

  • Business Impact: Prevents production use of Gemini API for Turkish content analysis

I hope there will be a solution soon or the problem is somewhere else where I can fix it personally.

Does anyone have the same issue?

Hi again,

adding charset parameter for the http request seems to be solution for this problem;

application/json; charset=utf-8

But I still want to understand where the problem is coming from. If it’s coming from Gemini endpoint this is a serious bug started to happen in the last 2 weeks and needs to be resolved without a workaround like charset parameter.

It worked for a few days without any issues, but the character encoding problem has started again now. In short, the issue wasn’t resolved with the charset=utf-8 parameter.

If anyone has knowledge about this, I’d appreciate it if you could let me know.

This is getting crazy.

Gemini App itself sent me an answer with corrupted characters. This issue is not about http request, API or n8n. This is something behind the scenes within Gemini itself.

I’d appreciate if someone can give me an advice because this is a high impact issue for me =/

Still happening. Any clue anyone =/

:waving_hand: hey thought this might help I red teamed it hope it works as a temp solution :+1:

Gemini API – Special Character / Encoding Issue (Turkish UTF-8)

TL;DR

Not a prompt or client bug

UTF-8 Turkish characters are intermittently misinterpreted as GBK on some backend nodes

Example: ı (C4 B1) → interpreted as GBK → 谋

Occurs ~30–40% of requests; reproduced in the Gemini app itself

Practical workaround:

Prevent with ASCII-only Unicode escapes in JSON

Detect + retry if corruption slips through

What’s happening (short version)

We traced this to classic mojibake occurring after model generation, during transport or middleware handling.

Concrete proof:

Turkish ı → UTF-8 bytes: 0xC4 0xB1

Those bytes interpreted as GBK → Chinese character 谋

Re-encoded to UTF-8 → corrupted output downstream

The byte mapping is exact. This strongly suggests that a subset of backend instances is decoding UTF-8 responses as GBK. The issue appearing inside the Gemini app rules out client-side parsing.

Strategy 1: ASCII Shield (Prevention)

If the response never contains multi-byte characters, the misconfigured node has nothing to corrupt.

System instruction (place near the end for emphasis):

> IMPORTANT OUTPUT RULE:

Output valid JSON only. For all non-ASCII characters (especially Turkish: ı, ş, ğ, ü, ö, ç), use Unicode escape sequences.

Example:

{ā€œtextā€:ā€œd\u0131\u015f\u0131ā€}

not

{ā€œtextā€:ā€œdÄ±ÅŸÄ±ā€}

Why it works:

The payload stays pure ASCII during transport. When the client parses JSON, the escapes decode back into correct Turkish characters automatically.

Strategy 2: GBK Repair (Detection + Cure)

LLMs can occasionally slip on strict formatting, so it’s useful to detect and recover.

Python

def robust_repair(text):

\# If no CJK-range characters, leave it alone

if not any('\\u4e00' <= char <= '\\u9fff' for char in text):

    return text

try:

    \# Invert the exact UTF-8 → GBK misinterpretation path observed

    return text.encode('gbk').decode('utf-8')

except (UnicodeEncodeError, UnicodeDecodeError):

    return text

JavaScript (n8n Code node)

function robustRepair(text) {

const hasChineseChars = /\[\\u4e00-\\u9fff\]/.test(text);

if (!hasChineseChars) return text;

// n8n lacks native GBK decoding

// Recommended: flag, retry request, and log sample

return {

    text,

    corrupted: true,

    needsRetry: true

};

}

n8n pattern: detect → retry once → log.

Avoid partial repairs in JS without proper charset libraries.

Recommended setup

Use both approaches:

1. ASCII Shield to prevent corruption

2. Detection to catch rare failures

3. Retry once before logging or repair

This turns an intermittent encoding ghost into a controlled, observable failure mode.

Note for Google engineers (if anyone from the team is reading)

This appears to be a middleware charset issue, not a model issue:

UTF-8 bytes (C4 B1) interpreted as GBK (谋)

~30–40% occurrence suggests specific backend instances

Reproduced in the Gemini app itself

Comparing response headers / request IDs between good and corrupted responses may help isolate the affected pool quickly.

Hope this helps unblock things while an official fix is in flight.

~ :vulcan_salute: Cmd.Proton & The Lab :test_tube:

1 Like

Thank you so much @CMD_Proton & The Lab for this detailed analysis! This confirms exactly what I was seeing.

the byte mapping explanation (UTF-8 → GBK → mojibake) makes perfect sense now. I’ll try to implement the ASCII Shield approach in my prompts and add the GBK repair function as a fallback in my workflow. Really appreciate you taking the time to red team this. Hope Google addresses this soon.

Thank you!

1 Like

No matter what I do, this problem persists. Does the Google team have no comment or solution regarding this? This is a very problematic situation.

This problem is still happening. And it’s a big problem for me and some of the users. Any solutions or reasons why it’s happening?