Gemini TTS API returns raw PCM data instead of standard audio formats - any way to get MP3/WAV directly?

Hi Google AI team and community,

I’m using the Gemini TTS API (gemini-2.5-flash-preview-tts) and noticed it returns raw PCM data in base64 format, which requires
additional processing to convert to browser-compatible formats like MP3 or WAV.

Current workflow:

  1. Call Gemini TTS API → Get raw PCM data (base64)
  2. Convert base64 → Buffer
  3. Use ffmpeg to convert PCM → MP3
  4. Send MP3 to client

Issues:

  • Step 3 consumes 250-300MB memory in Firebase Functions
  • Exceeds default 256MB memory limit
  • Requires additional processing time and resources

Questions:

  1. Is there a way to configure Gemini TTS to return MP3/WAV format directly?
  2. Are there plans to add output format options to the API?
  3. Any recommended best practices for efficient PCM→MP3 conversion in serverless environments?

Would appreciate any guidance or workarounds from the community!

API details:

  • Model: gemini-2.5-flash-preview-tts
  • Platform: Firebase Functions (Node.js)
  • Increase Firebase Function Memory: The most straightforward solution is to increase the memory allocation for your Firebase Function. In your index.js file, you can specify this using runWith. For example:

    JavaScript

    const functions = require('firebase-functions');
    
    exports.generateAudio = functions
      .runWith({ memory: '512MB' }) // or '1024MB'
      .https.onCall(async (data, context) => {
        // Your existing code here
      });
    
    

    This will increase the memory limit from the default 256MB, giving ffmpeg the headroom it needs. Note that this will also increase the cost of your function invocations.

  • Use a Lightweight PCM-to-WAV/MP3 Library: Instead of running a full-blown ffmpeg process, use a pure Node.js library that can handle the conversion. While ffmpeg is powerful, its overhead is what’s killing your memory. A library that can take the PCM buffer and write a WAV or MP3 header is much more efficient. For example, a library like wav can easily wrap your PCM data with a WAV header to create a valid WAV file.

    Example: Converting PCM to WAV in Node.js

    JavaScript

    const wav = require('wav');
    const { Buffer } = require('buffer');
    
    function createWavFile(pcmData) {
      const pcmBuffer = Buffer.from(pcmData, 'base64');
      const writer = new wav.FileWriter('output.wav', {
        channels: 1,
        sampleRate: 24000,
        bitDepth: 16,
      });
      writer.write(pcmBuffer);
      writer.end();
    }
    
    

    Converting to MP3 is more complex as it requires an encoder, but some libraries might offer this functionality without the overhead of ffmpeg. However, for a quick solution, converting to WAV is a good first step, as it’s a widely compatible format and the process is very light on resources.

In my opinion, that’s what is usually explained in the implementation of Firebase

and I don’t know, maybe you have a more impressive installation or system that is simple and powerful.

Pada Sen, 22 Sep 2025, 12.22, Niran Pravithana via Google AI Developers Forum <notifications@discuss.ai.google.dev> menulis: