Hi – the gemini-2.0-flash-live “family” of models has been great.
It would be super useful if, in addition to PCM16 i/o, these streaming models supported mulaw / g711_ulaw. Is that possible?
My application pipes data to/from a phone system and currently I have to re-encode to/from PCM16 on the fly. This works but it’s choppy. The OpenAI realtime system supports g711_ulaw i/o and it’s much smoother – but I’d rather stick with gemini
bump for this, we were just in a process of switching from openAI to gemini, but we got stuck when we found out that g711_ulaw format is not supported. Any news on this?
you can do it custom, but it’s not only the missing ulaw support, also the audio level and frequency adjustment is not done in gemini live, so you have to make both custom in cloud run.
Hi, same issue here.
The OpenAI Realtime API offers two options for configuring a session: “input_audio_format” and “output_audio_format.”
Both can be pcm16, g711_ulaw, or g711_alaw.
The Twilio website [contains an article on using VoIP with the OpenAI Realtime API]“Modified by moderator”
I’ve already created an application that leverages the Gemini Live API with both the gemini-2.5-flash-preview-native-audio-dialog and gemini-live-2.5-flash-preview models, using a browser as the client. However, the finished product will need to use VoIP.
Are you planning to add support for VoIP audio formats?
I’ve been reading this post since it was first published. The post and its comments are very accurate. To test a live API, we needed to convert audio files while they’re streaming. It works, but the result doesn’t sound good because it has some noise after the conversion. We are using Twilio.
Any updates on this issue? OpenAI made an interesting announcement about the RealTime API and also provided some excellent documentation on how to interface it with SIP protocols. Can we hope to see something similar from Google?