Gemini TTS ignores per-speaker voice settings in multi-character prompts

jokesonyou · May 21, 2025, 9:33am

Anyone else having this issue?
I’ve set different voices for each speaker in a multi-character script using Gemini TTS, but the model only uses one voice for the entire output!
Speaker names are clearly defined, and the settings are configured properly, yet no voice switching happens. I’ve noticed that this happens only when the script for the TTS is long. It had no problem with voice switching when it had to say 2-3 sentences.

This really limits the use of Gemini!

Would appreciate any fixes, workarounds, or confirmation from the team if this is a known bug.

Thanks!

felipem · May 23, 2025, 2:24am

I am also seeing this issue when using the API, it doesn’t even need to be a long script. It feels like the tone of the voice or the gender is also not properly respected in the API based on the samples available in AI Studio.

Djalma_Araujo · May 23, 2025, 11:23am

Yes, I am having the same issue in the API. The UI works, but the code copied from the AI Google Studio is not working.

Say_Truth · May 23, 2025, 4:15pm

Have you tried downloading the sound? After downloading the sound is crash

Pannaga_J · June 27, 2025, 6:44am

Hi @jokesonyou @Djalma_Araujo @felipem
I’ve successfully reproduced the scenario on my end, and it’s now working as expected in both AI Studio and with the copied code. Voice switching is functioning perfectly.Please re-check and let me know if you are still experiencing the issue.
Thank you

jokesonyou · June 27, 2025, 9:17am

It’s partially fixed. I’ve regenerated a couple of times and sometimes it gets it right, sometimes half the audio is voice switching correctly a and the other half is wrong. Maybe you wanna recreate the issue? Here’s the prompt directly from the right side of the AI Studio interface:

Use strong British roadman slang, drop your T's, hit them glottal stops — man ain't sayin' "button," it's "bu'on." Emphasise key slang words like fam, bruv, ting, Ps, g, bally, crud. Deliver bars like "Run the till or man'll mek it long" and "Don't be a hero, this ain't Avengers" with menace, like man's really on smoke. Add quick scoffs or laughs after jokes (e.g. "Loool," "Dead," or short chuckles). Use natural London street energy, bounce, and swagger. Slight pauses before punchlines. Speak like you're in a music video or after a cheeky robbery gone smooth. Keep it greazy, keep it raw.
Ready? You're TYRONE and JAY. Let's go:

TYRONE:
OG, man can't even cap, that was bare peak still. You clutched the cashier's face when man whipped out the tings.

JAY:
Swear down man, man looked like he seen a jinn fam. I hit him with the "run the till or man'll make it long."

TYRONE:
Blood started trembling like man's wearing flip-flops in December. Dead. Man folded quicker than a Primark T after one wash. And that little yute behind the till, trying to press panic like it's GTA. Man thinks he's in a Marvel thing.

JAY:
I told him, "don't get gassed fam, man ain't no superhero. This is real block biz." Bro froze like XP with 10 tabs open.

TYRONE:
Nah, you looked on crud though. Bali tight like man was born in it. Even mumsy wouldn't have clocked you, no lie.

JAY:
Pattern Bifferent, G. Can't get caught lacking. Don't want man's face on the feds' TikTok special.

TYRONE:
Come on. You grabbed the bread, yeah?

JAY:
Yeah G, man's got the Ps patterned. Nothing wild, just some light bread. Nando's and a clean tech fleece. Man's good.

TYRONE:
Say less. Next one we hit something proper. No more dusty corner spots. We going uptown, posh cafe vibes. Bear muffins, bear dough, clean getaway.

JAY:
Man moving mad. I ain't trying to get bagged over a croissant fam. Keep it lowkey, ends only. No extra noise.

TYRONE:
Real talk. Gotta move like mist. Not on them front pages. You got the whip?

JAY:
Yeah G, outside. Engine purring like a cat on loud mode. Let's dip, feds might start sniffing.

TYRONE:
Let's cut, fam. Mission done! Real life GTA but with no cheat codes.

JAY:
Come on man. Levels! Whole 'nother tier.

Pannaga_J · June 27, 2025, 10:26am

I’ve replicated the issue on my end for the prompt you shared and voice switching is functioning correctly.

Could you let me know which voice samples you used? I tested with Zephyr and Puck, which worked fine.
From the screenshot i see you have used sadachbia and zubenlgenubi. Still using that ?

jokesonyou · June 27, 2025, 5:15pm

I switched to sadachbia and sadaltager. Regenerate it like 2-3 times and you will start noticing voices blending into one another and failing to switch between them. I can see the improvement tho. It doesn’t always fail.

Gregory_Hyman · November 12, 2025, 8:17pm

This issue persists in AI Studio generating multi-character TTS. It appears to be intermittent.

WARF_Radio · November 28, 2025, 2:54pm

Agree. I was doing some tests tonight via API and at random times it would not respect the accent I had assigned to a speaker.

For now I get around it in a two (or more) person dialogue by using single speaker (via API) and getting FFMPEG to automatically stitch all the files together as one. It’s passable, but not as flowing in conversation as multi-speaker.

For my music based radio station, I’ll try multi-speaker again this week, but I don’t trust it.

*Side Note: A few times I got multi-speaker to give a “speaking via the telephone” effect to make the second person sound like a caller, but again, inconsisent in the way Gemini TTS chooses to apply it or not.

Topic		Replies	Views
2.5 flash tts multispeaker - no/wrong voice switch Gemini API gemini-flash-2-5	11	214	February 24, 2026
Inconsistent Audio Output with Gemini 2.5 Pro Preview TTS Google AI Studio ai-studio , gemini , audio	24	2589	February 20, 2026
Gemini TTS API - No Audio Generated when accent is specified Google AI Studio ai-studio , bug , api , gemini	3	62	March 25, 2026
Gemini 2.5 Flash/Pro Preview TTS – Wrong voice is used + style instructions are ignored Google AI Studio ai-studio	4	372	December 16, 2025
Gemini TTS Multi-Speaker Mode: 7 Critical Bugs After 3 Weeks in Production (finishReason 'OTHER', Truncation, Voice Swapping, Hallucinated Lines) Gemini API gemini	2	179	March 17, 2026

Gemini TTS ignores per-speaker voice settings in multi-character prompts

Related topics