Query on mixed-language outputs in the google/gemma-3n-4b-it model

Dibyajyoti_Mishra · October 27, 2025, 2:00am

Hi ,

We’re currently using the google/gemma-3n-4b-it model for translation tasks, but we’ve noticed frequent mixed-language outputs — partial sentences containing words from other scripts or languages, even with strict target-language prompts.

Could anyone please advise on how to mitigate this issue?

Any guidance on prompting strategies, fine-tuning approaches, or decoding parameters (e.g., temperature, top_p, etc.) that could help reduce this would be greatly appreciated.

Sonali_Kumari1 · October 28, 2025, 8:36am

Hi @Dibyajyoti_Mishra Welcome to the Google AI Forum!

To help with the mixed-language output issue you are facing, could you please clarify the exact model name you are using? Is it google/gemma-3-4b-it or google/gemma-3n-E4B-it? Also, provide an example of the input text, your prompt and the resulting mixed-language output including the reproducible code snippet? Thanks!

Dibyajyoti_Mishra · October 29, 2025, 1:50am

Hii @Sonali_Kumari1 ,

Here’s a reproducible example of the mixed-language issue we’re observing with the gemma-3n-e4b-it model.

Setup:

Source language: English
Target language: Serbian
Prompt used:

You are a professional translation assistant.
Task: Translate the text strictly from {source_lang} to {target_lang}.
- Always translate the entire sentence without omitting or cutting off words.
- Translate place names, proper nouns, and idiomatic phrases into their correct {target_lang} form.
- Do not invent or hallucinate words. Keep meaning faithful to the source.
- Never mix scripts. The output must be 100% in {target_lang} script.

### {source_lang} Text:
{text}
### {target_lang} Translation:

Input sentences:

1. Education is the foundation of progress.
2. Climate change is real, its impact is visible.
3. True leadership is not about power, but service.

Model outputs:

[1] Education is the foundation of progress. → Образовање је темељ развоја.   (Serbian)
[2] Climate change is real, its impact is visible. → Klima promene su realne, njihov uticaj je vidljiv.   (Croatian)
[3] True leadership is not about power, but service. → Stvarno vođstvo nije o moći, već o služenju.   (Bosnian)

Even though the target language is strictly set to Serbian, the model produces translations in similar regional languages (Croatian and Bosnian) for some sentences.

We have tested this with over 100 source–target language pairs using the same prompt, and this kind of language mixing issue occurs in more than 20 languages (e.g., outputs mixing Malayalam–Tamil, Hindi–Urdu, or Arabic–Persian).

Please review this issue and help resolve it when possible.

Sonali_Kumari1 · November 5, 2025, 7:37am

Hi @Dibyajyoti_Mishra ,
Apologies for the delayed response. I was able to reproduce the mixed-language output issue using gemma-3n-e4b-it model.

Using your prompt and input sentences, I observed a translation that appears closer to Croatian and Bosnian:

However, after adding a few-shot example and explicitly instructing the model to output in Serbian Cyrillic script, here is the output:

Attaching gist for your reference. Please let me know if this is helpful. Thanks!

Dibyajyoti_Mishra · November 11, 2025, 10:48am

Yes, I tested the Serbian translation, and it’s working now.
Currently, I have tested the following language pairs with sentences in formal, informal, and casual tones (using example-based prompts as you provided):
English → Serbian, English → Malayalam (NA), English → Punjabi (Gurmukhi), English → Sanskrit, English → Meiteilon (Manipuri) (NA), English → Amharic (NA), English → Burmese, English → Armenian (NA), English → Khmer (NA), English → Pashto (NA), English → Georgian (NA).

However, I am receiving mixed-language outputs for some of them.

For example, using the following Georgian system prompt:

system_prompt = (
    """You are a professional translation assistant.
    Task: Translate the text strictly from English to Georgian.
    The output must be in the Georgian script.
    
    English Text:
    Education is the foundation of progress.
    Georgian Translation: განათლება პროგრესის საფუძველია.

    English Text: How are you?
    Georgian Translation: როგორ ხარ?

    English Text: Knowledge is power.
    Georgian Translation: ცოდნა ძალაა.

    English Text: Let’s go for a walk.
    Georgian Translation: წავიდეთ გასეირნებლად.

    Now translate carefully and output only the Georgian translation.
    """
)

When testing, I received the following mixed-language outputs:
Input: “Shoot me that report when you can, yeah? ”
Output: “გაส่ง لي tę నివేదిక, როდესაც შეძლებ, კი?”

Input: “Could you please share the final report with me by this evening? ”
Output: “უკვე მომაწოდეთostat report вечера સુધીમાં.”

Kindly review this issue — it appears that the model is producing mixed-script or multilingual outputs instead of translating purely into Georgian script.

Sonali_Kumari1 · January 30, 2026, 5:55am

Hi @Dibyajyoti_Mishra, Apologies for the delayed response.

We now have TranslateGemma models, designed to handle translation tasks across 55 languages. Additionally, you can also fine tune the model for many languages as well. Please refer to the Technical Report and the model card for more details.

Thank you!

Topic		Replies	Views
Language Issue in gemini-2.5-flash-lite-preview-09-2025 Gemini API api , models , prompt	4	334	October 11, 2025
FEEDBACK: gemini-2.0-flash-thinking-exp-1219 switching language Gemini API api , models	5	426	December 26, 2024
Using the thinking 0121 model for translation often results in a mix of original text and translated text Google AI Studio api	5	219	June 7, 2025
Sudden inconsistent output and model overloaded on Gemini API since morning (Nov 12) Google AI Studio gemini-15 , bug , api , models	2	187	November 13, 2025
Make Translation LLM more flexible/general? Gemini API models , llm	4	215	June 28, 2025

Query on mixed-language outputs in the google/gemma-3n-4b-it model

Related topics