Hello everyone,
I’m working on a conversational AI application using the Gemini Live API (specifically the gemini-2.5-flash-preview-native-audio-dialog model) with an Expo React Native frontend.
My core challenge is on the client side: What is the official or community-recommended best practice for playing the real-time audio stream from the Live API in a React Native app?
The API streams raw, 24kHz, 16-bit mono PCM audio chunks every ~40ms over a WebSocket. The playback needs to be low-latency and perfectly gapless to be viable for a real-time conversation.
I’ve done extensive research into the available options, and each seems to have critical limitations for this specific use case:
- Standard Audio Libraries (e.g.,
expo-audio
,react-native-track-player
)
- Issue: These libraries are architected to play audio files from a URL or local storage. They are not designed to handle a high-frequency stream of raw, in-memory PCM buffers. Attempting to play each 40ms chunk as a new “sound” is not performant and leads to audio gaps.
- Link: GitHub - doublesymmetry/react-native-track-player: A fully fledged audio module created for music apps. Provides audio playback, external media controls, background mode and more!
- Specialized Streaming Libraries
react-native-live-audio-stream
: This library seems purpose-built for this, but it appears to be unmaintained, which is a major risk for a production app.react-native-audio-api
: This library is very promising as it aims to be a Web Audio API for mobile. However, its own documentation confirms that the key APIs needed for this (AudioWorkletNode or a queuing node like AudioBufferQueueSourceNode) are not yet implemented.- Link: GitHub - software-mansion/react-native-audio-api: High-performance audio engine
Our Current Conclusion:
This research has forced us to the conclusion that the only viable path is to build our own custom native module that uses AVAudioEngine on iOS and AudioTrack on Android to handle the low-level audio queuing and playback.
My Question for the Community:
Before we commit to the significant effort of building a custom native module, I have to ask:
- Are we missing something?
- Is there an official Google-recommended approach or a hidden SDK for handling the Live API’s audio stream on mobile?
- How have others in the community successfully solved this specific problem?
Any advice, best practices, or library recommendations would be hugely appreciated. Thank you