Query on mixed-language outputs in the google/gemma-3n-4b-it model

Hi ,

We’re currently using the google/gemma-3n-4b-it model for translation tasks, but we’ve noticed frequent mixed-language outputs — partial sentences containing words from other scripts or languages, even with strict target-language prompts.

Could anyone please advise on how to mitigate this issue?

Any guidance on prompting strategies, fine-tuning approaches, or decoding parameters (e.g., temperature, top_p, etc.) that could help reduce this would be greatly appreciated.

1 Like

Hi @Dibyajyoti_Mishra Welcome to the Google AI Forum!

To help with the mixed-language output issue you are facing, could you please clarify the exact model name you are using? Is it google/gemma-3-4b-it or google/gemma-3n-E4B-it? Also, provide an example of the input text, your prompt and the resulting mixed-language output including the reproducible code snippet? Thanks!

2 Likes

Hii @Sonali_Kumari1 ,

Here’s a reproducible example of the mixed-language issue we’re observing with the gemma-3n-e4b-it model.

Setup:

  • Source language: English

  • Target language: Serbian

  • Prompt used:

You are a professional translation assistant.
Task: Translate the text strictly from {source_lang} to {target_lang}.
- Always translate the entire sentence without omitting or cutting off words.
- Translate place names, proper nouns, and idiomatic phrases into their correct {target_lang} form.
- Do not invent or hallucinate words. Keep meaning faithful to the source.
- Never mix scripts. The output must be 100% in {target_lang} script.

### {source_lang} Text:
{text}
### {target_lang} Translation:

Input sentences:

1. Education is the foundation of progress.
2. Climate change is real, its impact is visible.
3. True leadership is not about power, but service.

Model outputs:

[1] Education is the foundation of progress. → Образовање је темељ развоја.   (Serbian)
[2] Climate change is real, its impact is visible. → Klima promene su realne, njihov uticaj je vidljiv.   (Croatian)
[3] True leadership is not about power, but service. → Stvarno vođstvo nije o moći, već o služenju.   (Bosnian)

Even though the target language is strictly set to Serbian, the model produces translations in similar regional languages (Croatian and Bosnian) for some sentences.

We have tested this with over 100 source–target language pairs using the same prompt, and this kind of language mixing issue occurs in more than 20 languages (e.g., outputs mixing Malayalam–Tamil, Hindi–Urdu, or Arabic–Persian).

Please review this issue and help resolve it when possible.

1 Like

Hi @Dibyajyoti_Mishra ,
Apologies for the delayed response. I was able to reproduce the mixed-language output issue using gemma-3n-e4b-it model.

Using your prompt and input sentences, I observed a translation that appears closer to Croatian and Bosnian:

However, after adding a few-shot example and explicitly instructing the model to output in Serbian Cyrillic script, here is the output:

Attaching gist for your reference. Please let me know if this is helpful. Thanks!

Yes, I tested the Serbian translation, and it’s working now.
Currently, I have tested the following language pairs with sentences in formal, informal, and casual tones (using example-based prompts as you provided):
English → Serbian, English → Malayalam (NA), English → Punjabi (Gurmukhi), English → Sanskrit, English → Meiteilon (Manipuri) (NA), English → Amharic (NA), English → Burmese, English → Armenian (NA), English → Khmer (NA), English → Pashto (NA), English → Georgian (NA).

However, I am receiving mixed-language outputs for some of them.

For example, using the following Georgian system prompt:

system_prompt = (
    """You are a professional translation assistant.
    Task: Translate the text strictly from English to Georgian.
    The output must be in the Georgian script.
    
    English Text:
    Education is the foundation of progress.
    Georgian Translation: განათლება პროგრესის საფუძველია.

    English Text: How are you?
    Georgian Translation: როგორ ხარ?

    English Text: Knowledge is power.
    Georgian Translation: ცოდნა ძალაა.

    English Text: Let’s go for a walk.
    Georgian Translation: წავიდეთ გასეირნებლად.

    Now translate carefully and output only the Georgian translation.
    """
)

When testing, I received the following mixed-language outputs:
Input: “Shoot me that report when you can, yeah? ”
Output: “გაส่ง لي tę నివేదిక, როდესაც შეძლებ, კი?”

Input: “Could you please share the final report with me by this evening? ”
Output: “უკვე მომაწოდეთostat report вечера સુધીમાં.”

Kindly review this issue — it appears that the model is producing mixed-script or multilingual outputs instead of translating purely into Georgian script.

Hi @Dibyajyoti_Mishra, Apologies for the delayed response.

We now have TranslateGemma models, designed to handle translation tasks across 55 languages. Additionally, you can also fine tune the model for many languages as well. Please refer to the Technical Report and the model card for more details.

Thank you!

1 Like