Gemini TTS API returns raw PCM data instead of standard audio formats - any way to get MP3/WAV directly?

Niran_Pravithana · September 22, 2025, 4:12am

Hi Google AI team and community,

I’m using the Gemini TTS API (gemini-2.5-flash-preview-tts) and noticed it returns raw PCM data in base64 format, which requires
additional processing to convert to browser-compatible formats like MP3 or WAV.

Current workflow:

Call Gemini TTS API → Get raw PCM data (base64)
Convert base64 → Buffer
Use ffmpeg to convert PCM → MP3
Send MP3 to client

Issues:

Step 3 consumes 250-300MB memory in Firebase Functions
Exceeds default 256MB memory limit
Requires additional processing time and resources

Questions:

Is there a way to configure Gemini TTS to return MP3/WAV format directly?
Are there plans to add output format options to the API?
Any recommended best practices for efficient PCM→MP3 conversion in serverless environments?

Would appreciate any guidance or workarounds from the community!

API details:

Model: gemini-2.5-flash-preview-tts
Platform: Firebase Functions (Node.js)

Aciax_Hls · September 22, 2025, 8:23am

Increase Firebase Function Memory: The most straightforward solution is to increase the memory allocation for your Firebase Function. In your index.js file, you can specify this using runWith. For example:

JavaScript
```
const functions = require('firebase-functions');

exports.generateAudio = functions
  .runWith({ memory: '512MB' }) // or '1024MB'
  .https.onCall(async (data, context) => {
    // Your existing code here
  });
```
This will increase the memory limit from the default 256MB, giving ffmpeg the headroom it needs. Note that this will also increase the cost of your function invocations.
Use a Lightweight PCM-to-WAV/MP3 Library: Instead of running a full-blown ffmpeg process, use a pure Node.js library that can handle the conversion. While ffmpeg is powerful, its overhead is what’s killing your memory. A library that can take the PCM buffer and write a WAV or MP3 header is much more efficient. For example, a library like wav can easily wrap your PCM data with a WAV header to create a valid WAV file.

Example: Converting PCM to WAV in Node.js

JavaScript
```
const wav = require('wav');
const { Buffer } = require('buffer');

function createWavFile(pcmData) {
  const pcmBuffer = Buffer.from(pcmData, 'base64');
  const writer = new wav.FileWriter('output.wav', {
    channels: 1,
    sampleRate: 24000,
    bitDepth: 16,
  });
  writer.write(pcmBuffer);
  writer.end();
}
```
Converting to MP3 is more complex as it requires an encoder, but some libraries might offer this functionality without the overhead of ffmpeg. However, for a quick solution, converting to WAV is a good first step, as it’s a widely compatible format and the process is very light on resources.

In my opinion, that’s what is usually explained in the implementation of Firebase

and I don’t know, maybe you have a more impressive installation or system that is simple and powerful.

Pada Sen, 22 Sep 2025, 12.22, Niran Pravithana via Google AI Developers Forum <notifications@discuss.ai.google.dev> menulis:

Topic		Replies	Views
Gemini Flash TTS speed? hows your experience? Gemini API gemini-api	1	791	June 11, 2025
Troubleshooting broken audio with Gemini 2.5 TTS Gemini API bug , api , audio	1	220	October 14, 2025
Gemini 2.5 Flash TTS streaming? Gemini API api , audio	12	1201	February 25, 2026
Transcribe text to text and vice versa, speech to speech and image to text in a flutter app using gemini Gemini API	15	808	May 20, 2024
Live API -- support for mulaw (g711_ulaw) input/output? Gemini API api , live-streaming	10	663	September 4, 2025

Gemini TTS API returns raw PCM data instead of standard audio formats - any way to get MP3/WAV directly?

Related topics