Live API -- support for mulaw (g711_ulaw) input/output?

Hi – the gemini-2.0-flash-live “family” of models has been great.

It would be super useful if, in addition to PCM16 i/o, these streaming models supported mulaw / g711_ulaw. Is that possible?

My application pipes data to/from a phone system and currently I have to re-encode to/from PCM16 on the fly. This works but it’s choppy. The OpenAI realtime system supports g711_ulaw i/o and it’s much smoother – but I’d rather stick with gemini :grinning_face:

8 Likes

Hi @MB_AST,

Thank you for your valuable suggestions. We appreciate your input and will be sure to share this with the team.

1 Like

bump for this, we were just in a process of switching from openAI to gemini, but we got stuck when we found out that g711_ulaw format is not supported. Any news on this?

2 Likes

+1

same issue here… solutions like converting on the fly from ulaw to pcm result most of the time in uneccessary noise

3 Likes

same here… please fix that we need it urgently many startups i know are using it and had same issue

2 Likes

+1

bumping this as well, currently stuck with openai realtime because of this issue

2 Likes

you can do it custom, but it’s not only the missing ulaw support, also the audio level and frequency adjustment is not done in gemini live, so you have to make both custom in cloud run.

1 Like

Hi, same issue here.
The OpenAI Realtime API offers two options for configuring a session: “input_audio_format” and “output_audio_format.”
Both can be pcm16, g711_ulaw, or g711_alaw.
The Twilio website [contains an article on using VoIP with the OpenAI Realtime API]“Modified by moderator”
I’ve already created an application that leverages the Gemini Live API with both the gemini-2.5-flash-preview-native-audio-dialog and gemini-live-2.5-flash-preview models, using a browser as the client. However, the finished product will need to use VoIP.
Are you planning to add support for VoIP audio formats?

Just to add +1

Supporting output at 8khz sample rate would be great when used with Twilio.

I’ve been reading this post since it was first published. The post and its comments are very accurate. To test a live API, we needed to convert audio files while they’re streaming. It works, but the result doesn’t sound good because it has some noise after the conversion. We are using Twilio. :slight_smile:

1 Like

Any updates on this issue? OpenAI made an interesting announcement about the RealTime API and also provided some excellent documentation on how to interface it with SIP protocols. Can we hope to see something similar from Google?

1 Like