I wish Gemini TTS able to support response format in AAC or M4A

Will · June 14, 2025, 3:37am

Currently, the Gemini text-to-speech (TTS) interface (gemini-2.5-pro-preview-tts) “only” supports Linear-PCM encapsulated in a WAV container as the output format. The official documentation does not provide configuration parameters to directly encode speech into AAC/M4A.

I really wish it could support AAC/M4A format in the response. Thanks!

sad · June 14, 2025, 10:30am

“Modified by moderator” It causes me to be very impolite to gemmy sometimes. It really would be nice to drive to work and converse about what’s on your mind, could take the place of lying in bed way too late at night reading Wikipedia articles like a few years ago.

In general though, yeah totally agree and support.

GUNAND_MAYANGLAMBAM · June 16, 2025, 8:20am

Hi @Will , Welcome to the forum.

Thank you for the feedback, we have noted your request for AAC/M4A output support in Gemini TTS. We will share it with the team for consideration.

KAISEON · June 16, 2025, 10:55pm

Surely you can just convert WAV=>AAC/M4A, right?

Frustrating, I know,
but single-step solution.

I use mobile devices only - Android/Samsung
(phone + tablet + BT keyboard)

No PC… Sorry…

Here’s my steps for converting WAV=>MP3:

MiXplorer (my main file browser for years )
Select .wav file
Rename + .wav [file extension] to “.mp3”

That’s it.

Works too with images generated as
PNG=>JPG

MP3 & JPG are more “user-friendly” for
sharing or uploading in my experience.

WordPress, Email, Social Media,
Google Workspace…

Maybe Gemini + NotebookLM

AI Studio + Labs will eventually
Get to AAC or exporting others.

But until then, we can try & find
The most efficient work-arounds
to get the desired end result…

Try searching:
“open-source AAC converter”

Takes no longer than hitting
Generate Deep Dive

Topic		Replies	Views
More audio file type support in (openai-compatible) api? Gemini API audio , openai_compatibility	4	510	June 13, 2025
Need for Modality Recomposition: Access to TTS and STT API required Gemini API api , text-vectorization	1	226	August 7, 2025
Text to speech? Gemini API feature_request	3	1011	January 21, 2025
Live API -- support for mulaw (g711_ulaw) input/output? Gemini API api , live-streaming	10	663	September 4, 2025
Why not support more voices or support synthesized voices in multimodal live api? Gemini API audio	4	118	February 20, 2025

I wish Gemini TTS able to support response format in AAC or M4A

Related topics