Hi Google AI team and community,
I’m using the Gemini TTS API (gemini-2.5-flash-preview-tts) and noticed it returns raw PCM data in base64 format, which requires
additional processing to convert to browser-compatible formats like MP3 or WAV.
Current workflow:
- Call Gemini TTS API → Get raw PCM data (base64)
- Convert base64 → Buffer
- Use ffmpeg to convert PCM → MP3
- Send MP3 to client
Issues:
- Step 3 consumes 250-300MB memory in Firebase Functions
- Exceeds default 256MB memory limit
- Requires additional processing time and resources
Questions:
- Is there a way to configure Gemini TTS to return MP3/WAV format directly?
- Are there plans to add output format options to the API?
- Any recommended best practices for efficient PCM→MP3 conversion in serverless environments?
Would appreciate any guidance or workarounds from the community!
API details:
- Model: gemini-2.5-flash-preview-tts
- Platform: Firebase Functions (Node.js)
-
Increase Firebase Function Memory: The most straightforward solution is to increase the memory allocation for your Firebase Function. In your index.js file, you can specify this using runWith. For example:
JavaScript
const functions = require('firebase-functions');
exports.generateAudio = functions
.runWith({ memory: '512MB' }) // or '1024MB'
.https.onCall(async (data, context) => {
// Your existing code here
});
This will increase the memory limit from the default 256MB, giving ffmpeg the headroom it needs. Note that this will also increase the cost of your function invocations.
-
Use a Lightweight PCM-to-WAV/MP3 Library: Instead of running a full-blown ffmpeg process, use a pure Node.js library that can handle the conversion. While ffmpeg is powerful, its overhead is what’s killing your memory. A library that can take the PCM buffer and write a WAV or MP3 header is much more efficient. For example, a library like wav can easily wrap your PCM data with a WAV header to create a valid WAV file.
Example: Converting PCM to WAV in Node.js
JavaScript
const wav = require('wav');
const { Buffer } = require('buffer');
function createWavFile(pcmData) {
const pcmBuffer = Buffer.from(pcmData, 'base64');
const writer = new wav.FileWriter('output.wav', {
channels: 1,
sampleRate: 24000,
bitDepth: 16,
});
writer.write(pcmBuffer);
writer.end();
}
Converting to MP3 is more complex as it requires an encoder, but some libraries might offer this functionality without the overhead of ffmpeg. However, for a quick solution, converting to WAV is a good first step, as it’s a widely compatible format and the process is very light on resources.
In my opinion, that’s what is usually explained in the implementation of Firebase
and I don’t know, maybe you have a more impressive installation or system that is simple and powerful.
Pada Sen, 22 Sep 2025, 12.22, Niran Pravithana via Google AI Developers Forum <notifications@discuss.ai.google.dev> menulis: