Query on mixed-language outputs in the google/gemma-3n-4b-it model

Hi ,

We’re currently using the google/gemma-3n-4b-it model for translation tasks, but we’ve noticed frequent mixed-language outputs — partial sentences containing words from other scripts or languages, even with strict target-language prompts.

Could anyone please advise on how to mitigate this issue?

Any guidance on prompting strategies, fine-tuning approaches, or decoding parameters (e.g., temperature, top_p, etc.) that could help reduce this would be greatly appreciated.

1 Like

Hi @Dibyajyoti_Mishra Welcome to the Google AI Forum!

To help with the mixed-language output issue you are facing, could you please clarify the exact model name you are using? Is it google/gemma-3-4b-it or google/gemma-3n-E4B-it? Also, provide an example of the input text, your prompt and the resulting mixed-language output including the reproducible code snippet? Thanks!

1 Like

Hii @Sonali_Kumari1 ,

Here’s a reproducible example of the mixed-language issue we’re observing with the gemma-3n-e4b-it model.

Setup:

  • Source language: English

  • Target language: Serbian

  • Prompt used:

You are a professional translation assistant.
Task: Translate the text strictly from {source_lang} to {target_lang}.
- Always translate the entire sentence without omitting or cutting off words.
- Translate place names, proper nouns, and idiomatic phrases into their correct {target_lang} form.
- Do not invent or hallucinate words. Keep meaning faithful to the source.
- Never mix scripts. The output must be 100% in {target_lang} script.

### {source_lang} Text:
{text}
### {target_lang} Translation:

Input sentences:

1. Education is the foundation of progress.
2. Climate change is real, its impact is visible.
3. True leadership is not about power, but service.

Model outputs:

[1] Education is the foundation of progress. → Образовање је темељ развоја.   (Serbian)
[2] Climate change is real, its impact is visible. → Klima promene su realne, njihov uticaj je vidljiv.   (Croatian)
[3] True leadership is not about power, but service. → Stvarno vođstvo nije o moći, već o služenju.   (Bosnian)

Even though the target language is strictly set to Serbian, the model produces translations in similar regional languages (Croatian and Bosnian) for some sentences.

We have tested this with over 100 source–target language pairs using the same prompt, and this kind of language mixing issue occurs in more than 20 languages (e.g., outputs mixing Malayalam–Tamil, Hindi–Urdu, or Arabic–Persian).

Please review this issue and help resolve it when possible.

1 Like

Hi @Dibyajyoti_Mishra ,
Apologies for the delayed response. I was able to reproduce the mixed-language output issue using gemma-3n-e4b-it model.

Using your prompt and input sentences, I observed a translation that appears closer to Croatian and Bosnian:

However, after adding a few-shot example and explicitly instructing the model to output in Serbian Cyrillic script, here is the output:

Attaching gist for your reference. Please let me know if this is helpful. Thanks!