The voice generation system produces inconsistent results across multiple generations. When generating audio for the same character, the second output often differs significantly from the first. This includes changes in pitch, tone, emotion, sound quality, and even the overall identity of the voice. Voice characteristics should remain stable across generations.
Additionally, the model does not correctly interpret Amharic heteronyms. Amharic contains many words that share the same spelling but have different meanings depending on pronunciation. The model currently selects meanings inconsistentlyâoften at randomâbecause it does not appear to use contextual cues during translation or text-to-speech processing.
Examples of affected Amharic heteronyms:
-
áá
-
gena: âstill / yetâ
-
genna: âChristmas / day of gloryâ
-
-
áá
-
lega: âfresh / youngâ
-
legga: âhitâ (verb)
-
-
áá
-
wana: âswimmingâ
-
wanna: âmain / coreâ
-
-
á˝ááł
-
shifta: ârebelâ
-
shiffta: ârashâ (skin condition)
-
-
á¨á
-
kefa: place name
-
keffa: âworseâ
-
For comparison, this issue is similar to English heteronyms such as lead (verb) vs. lead (metal) or tear (cry) vs. tear (rip)âwhere meaning depends entirely on context. The current system does not reliably apply contextual logic when interpreting these words.
Impact:
-
Incorrect translations and voice outputs
-
Loss of meaning in Amharic text
-
Unreliable character consistency in audio
-
Overall degradation of user experience for Amharic speakers
Expected Behavior:
-
Voice characteristics (pitch, tone, emotion, identity) should remain consistent across generations for the same character.
-
The model should accurately interpret Amharic heteronyms based on context, similar to how it handles English heteronyms.
-
Audio output should apply a stronger and more natural Amharic accent when generating Amharic speech.
Steps to Reproduce:
-
Generate audio for any character using Amharic text.
-
Regenerate the same line or paragraph using the same character.
-
Compare the voice quality and interpretation of heteronyms between generations.
Additional Request:
If possible, please enhance the Amharic accent realism and consistency in generated speech.