What we found when creating audiobooks: You can add a decoding seed parameter and stick to that (like when Minecraft generated worlds) you can try online system prompts, you can chunk aggressively .. and at the end of the day it’s still gonna sound different but it takes the edge off a bit.
Unfortunately, I tried it a lot, and decided that it’s not for work or anything serious; it’s just for fun or short clips.
it gives me inconsistent voice and always pics up an indian accent for no reason.
I’m experiencing the same audio inconsistency problem in Turkish. I create my audio files using the “Algieba” voice, but even though I give the same parameters every time, the resulting sounds different in speed and tone, as if someone else is speaking. It’s impossible that Google hasn’t noticed this problem; I can’t understand why they haven’t found a solution to it for so long.
This was such a annoying issue for me so I created ttsaudit .com/ to help find the inconsistent tts outputs amoung the working outputs. Hopefully it can help you also!