Environment: Google AI Studio, Gemini 3.1 Pro, NotebookLM Audio Generation
Issue: Generative Audio Tokenization Failure for Pashto
Pashto is listed as supported, but the text-to-speech engine fails phonetic grounding. The architecture treats Pashto as a derivative of Urdu or Arabic. It strips the language of native retroflex consonants.
Technical Breakdown of Failure
Pashto relies on specific retroflex sounds and alveolar affricates. The letter ښ represents a voiceless retroflex fricative /ʂ/ in Kandahari and a voiceless velar fricative /x/ in Peshawari. The letter ړ represents a retroflex flap /ɽ/. The current models fail to render these phonemes. The system also fails to distinguish the five distinct yah symbols in Pashto orthography.
Steps to Reproduce
- Input the Pashto phrase ښه راغلاست into the audio generation tool.
- Expected output Kandahari: [ʂə raɣlaast].
- Expected output Peshawari: [xə raɣlaast].
- Actual output: The engine substitutes retroflex/velar fricatives with incorrect Arabic/Urdu approximations.
This is a failure in the grapheme-to-phoneme mapping pipeline. The audio models require updated phonetic dictionaries specific to Pashto retroflex consonants to stop hallucinating outputs.