Gemini API - Special Character Problem

Goktug_Yurt · December 13, 2025, 7:02am

Hello,

I have an agent that performs real-time analysis of videos uploaded to specific YouTube channels. I send the video link and some instructions along with the prompt to the following endpoint (Gemini API):

https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent

It had been working without issues for a long time. However, I started getting an error like the one below with Gemini 2.5 Pro. I then switched to Gemini 3 Pro thinking it was a model-related issue, but the error persists. From what I understand, there’s a problem related to Turkish characters and UTF-8.

This system runs on n8n. And the problem is this doesn’t happen with all requests. Some of them returns an unreadable text. I would really appreciate it if you could help.

The Prompt that I use:

**return {
contents: [
{
role: “user”,
parts: [
{
fileData: {
mimeType: “video/*”,
fileUri: $input.first().json.videoUrl
}
},
{
text: `This is a long prompt so I removed it`
}
]
}
],
generationConfig: {
responseMimeType: “text/plain”
}
};

This is what I get from Gemini (http request):**

AJet鈥檌n yurt d谋艧谋 u莽u艧lar谋 i莽in g眉ncellenen bagaj prosed眉rlerini, limitlerini ve bilet segmentlerine g枚re (Basic, EcoJet, Flex, Premium) de臒i艧en bagaj haklar谋n谋 yolculara net bir 艧ekilde 枚臒retmek.\n\n**RTB:**\nK眉莽眉k 莽anta, kabin bagaj谋 ve kay谋tl谋 bagaj i莽in sunulan kesin 枚l莽眉ler (cm), a臒谋rl谋k limitleri (kg) ve bilet paketlerinin kapsam谋n谋 g枚steren detayl谋 kar艧谋la艧t谋rma tablosu.

Mustafa_Mahdi · December 13, 2025, 7:37am

The issue seems to be UTF-8 encoding of Turkish characters. Ensuring all text is properly UTF-8 encoded should fix the unreadable text errors.

Goktug_Yurt · December 13, 2025, 8:26am

Yes that’s the problem but I am trying to understand what triggers this issue after working for a year without a problem.

Http request asks for a raw json format. Changing this into json may solve the issue. Just thinking aloud.

Goktug_Yurt · December 13, 2025, 4:26pm

I tried many things and settings and parameters and I think this is a Gemini endpoint bug.

Problem Description

The API sometimes returns Turkish text with character encoding corruption, where UTF-8 Turkish characters are incorrectly encoded as Chinese characters. This appears to be a server-side encoding issue affecting approximately 30-40% of requests.

I believe some of the servers are generating corrupted characters since there are several servers dealing with my http request. Some of them works fine, some of them not.

Affected Endpoint

POST https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent

Character Corruption Examples

Expected (UTF-8)	Received (Corrupted)	Hex Bytes (Expected)	Hex Bytes (Received)
ı (Turkish i)	谋 (Chinese)	`c4 b1`	`e8 b0 8b`
ş (Turkish s)	艧 (Chinese)	`c5 9f`	`e8 89 a7`
ü (Turkish u)	眉 (Chinese)	`c3 bc`	Unknown
ç (Turkish c)	莽 (Chinese)	`c3 a7`	Unknown
ğ (Turkish g)	臒 (Chinese)	`c4 9f`	Unknown
ö (Turkish o)	枚 (Chinese)	`c3 b6`	Unknown

Request Characteristics

Input Language: Turkish (UTF-8 encoded prompt)
Input Type: Video analysis with text prompt
Response Language: Turkish (expected)
Frequency: ~30-40% of requests return corrupted encoding
Pattern: Appears to be load-balancer or regional server dependent

Example 1: Corrupted Response

{
“candidates”: [
{
“content”: {
“parts”: [
{
“text”: “Stratejik Mesaj:\nAJet鈥檌n yurt d谋艧谋 u莽u艧lar谋 i莽in g眉ncellenen bagaj prosed眉rlerini…”
}
]
}
}
]
}

Decoded Text (corrupted):

“AJet’in yurt d谋艧谋 u莽u艧lar谋 i莽in g眉ncellenen bagaj prosed眉rlerini..”

Expected Text:

“AJet’in yurt dışı uçuşları için güncellenen bagaj prosedürlerini…”

Encoding Corruption Pattern

The corruption suggests the response is being:

Generated correctly as UTF-8
Incorrectly interpreted/transcoded through Big5 or GB2312 encoding
Re-encoded as UTF-8 with Chinese character mappings

Byte-Level Evidence

Turkish “dışı” should be: 64 c4 b1 c5 9f c4 b1 (UTF-8)

Instead received: 64 e8 b0 8b e8 89 a7 e8 b0 8b (UTF-8 bytes representing Chinese chars)

This indicates a double-encoding issue or incorrect charset interpretation on certain backend servers.

Impact

Severity: High - Makes Turkish language responses unusable
Workaround: Client-side character mapping (inefficient and fragile)
Business Impact: Prevents production use of Gemini API for Turkish content analysis

I hope there will be a solution soon or the problem is somewhere else where I can fix it personally.

Does anyone have the same issue?

Goktug_Yurt · December 14, 2025, 7:26am

Hi again,

adding charset parameter for the http request seems to be solution for this problem;

application/json; charset=utf-8

But I still want to understand where the problem is coming from. If it’s coming from Gemini endpoint this is a serious bug started to happen in the last 2 weeks and needs to be resolved without a workaround like charset parameter.

Goktug_Yurt · December 17, 2025, 7:58am

It worked for a few days without any issues, but the character encoding problem has started again now. In short, the issue wasn’t resolved with the charset=utf-8 parameter.

If anyone has knowledge about this, I’d appreciate it if you could let me know.

Goktug_Yurt · December 17, 2025, 3:16pm

This is getting crazy.

Gemini App itself sent me an answer with corrupted characters. This issue is not about http request, API or n8n. This is something behind the scenes within Gemini itself.

I’d appreciate if someone can give me an advice because this is a high impact issue for me =/

Goktug_Yurt · December 22, 2025, 9:22pm

Still happening. Any clue anyone =/

CMD_Proton · December 23, 2025, 6:21am

hey thought this might help I red teamed it hope it works as a temp solution

Gemini API – Special Character / Encoding Issue (Turkish UTF-8)

TL;DR

Not a prompt or client bug

UTF-8 Turkish characters are intermittently misinterpreted as GBK on some backend nodes

Example: ı (C4 B1) → interpreted as GBK → 谋

Occurs ~30–40% of requests; reproduced in the Gemini app itself

Practical workaround:

Prevent with ASCII-only Unicode escapes in JSON

Detect + retry if corruption slips through

What’s happening (short version)

We traced this to classic mojibake occurring after model generation, during transport or middleware handling.

Concrete proof:

Turkish ı → UTF-8 bytes: 0xC4 0xB1

Those bytes interpreted as GBK → Chinese character 谋

Re-encoded to UTF-8 → corrupted output downstream

The byte mapping is exact. This strongly suggests that a subset of backend instances is decoding UTF-8 responses as GBK. The issue appearing inside the Gemini app rules out client-side parsing.

Strategy 1: ASCII Shield (Prevention)

If the response never contains multi-byte characters, the misconfigured node has nothing to corrupt.

System instruction (place near the end for emphasis):

> IMPORTANT OUTPUT RULE:

Output valid JSON only. For all non-ASCII characters (especially Turkish: ı, ş, ğ, ü, ö, ç), use Unicode escape sequences.

Example:

{“text”:“d\u0131\u015f\u0131”}

not

{“text”:“dışı”}

Why it works:

The payload stays pure ASCII during transport. When the client parses JSON, the escapes decode back into correct Turkish characters automatically.

Strategy 2: GBK Repair (Detection + Cure)

LLMs can occasionally slip on strict formatting, so it’s useful to detect and recover.

Python

def robust_repair(text):

\# If no CJK-range characters, leave it alone

if not any('\\u4e00' <= char <= '\\u9fff' for char in text):

    return text

try:

    \# Invert the exact UTF-8 → GBK misinterpretation path observed

    return text.encode('gbk').decode('utf-8')

except (UnicodeEncodeError, UnicodeDecodeError):

    return text

JavaScript (n8n Code node)

function robustRepair(text) {

const hasChineseChars = /\[\\u4e00-\\u9fff\]/.test(text);

if (!hasChineseChars) return text;

// n8n lacks native GBK decoding

// Recommended: flag, retry request, and log sample

return {

    text,

    corrupted: true,

    needsRetry: true

};

}

n8n pattern: detect → retry once → log.

Avoid partial repairs in JS without proper charset libraries.

Recommended setup

Use both approaches:

1. ASCII Shield to prevent corruption

2. Detection to catch rare failures

3. Retry once before logging or repair

This turns an intermittent encoding ghost into a controlled, observable failure mode.

Note for Google engineers (if anyone from the team is reading)

This appears to be a middleware charset issue, not a model issue:

UTF-8 bytes (C4 B1) interpreted as GBK (谋)

~30–40% occurrence suggests specific backend instances

Reproduced in the Gemini app itself

Comparing response headers / request IDs between good and corrupted responses may help isolate the affected pool quickly.

Hope this helps unblock things while an official fix is in flight.

~ Cmd.Proton & The Lab

Goktug_Yurt · December 23, 2025, 1:34pm

Thank you so much @CMD_Proton & The Lab for this detailed analysis! This confirms exactly what I was seeing.

the byte mapping explanation (UTF-8 → GBK → mojibake) makes perfect sense now. I’ll try to implement the ASCII Shield approach in my prompts and add the GBK repair function as a fallback in my workflow. Really appreciate you taking the time to red team this. Hope Google addresses this soon.

Thank you!

Goktug_Yurt · December 31, 2025, 6:49am

No matter what I do, this problem persists. Does the Google team have no comment or solution regarding this? This is a very problematic situation.

Goktug_Yurt · January 17, 2026, 9:17am

This problem is still happening. And it’s a big problem for me and some of the users. Any solutions or reasons why it’s happening?

Goktug_Yurt · April 27, 2026, 5:10am

Still happening……….,

Topic		Replies	Views
Characters with accents are generated in lots of wrong ways Gemini API gemini-flash	7	470	January 8, 2026
Random Endless \n Output in Gemini API 1.5 Pro Responses Gemini API gemini-15 , model	18	1757	April 2, 2026
Gemini 2.0 Flash Non UTF-8 Encoding Issue Gemini API api , gemini , api-key	3	196	July 11, 2025
Gemini 2.0 Flash fails to generate a structured output Gemini API bug , api , vertexai , gemini-20	3	576	July 30, 2025
Interactions API: Spanner UTF-8 error when generating Korean response after tool calls Gemini API api , gemini-api , gemini-flash	6	137	January 6, 2026