Static Audio Output from Gemini Live API (google-genai SDK) on iOS with AVAudioEngine

Addie_Design · May 3, 2025, 9:15pm

Hi everyone,

I’m encountering a persistent issue with the Gemini Live API (google-genai SDK) where the audio output received from the API results in loud static during playback on iOS (tested on both Simulator and a physical iPhone), even though the connection and basic interaction seem to work. I’m hoping someone might have insights or suggestions.

Goal:
Implement real-time, bidirectional voice conversation between a SwiftUI iOS app and a Python (FastAPI) backend using the Gemini Live API for STT/LLM/TTS.

Setup:

Backend: Python 3.11, FastAPI, Uvicorn, google-genai SDK v1.12.1 (using AI Studio API Key).
Frontend: SwiftUI, iOS [Your Target iOS Version, e.g., 17.5+], Xcode16.0 (16A242d). Using AVAudioEngine for recording (AudioManager) and playback (AudioPlayer), URLSessionWebSocketTask for communication (NetworkManager).
API Call: Backend uses client.aio.live.connect with model=“gemini-2.0-flash-live-001” (also tried gemini-1.5-pro-latest) and minimal LiveConnectConfig(response_modalities=[“AUDIO”]). System prompt is sent via send_client_content. User audio chunks are sent via send_realtime_input(audio=types.Blob(…)).
Audio Formats:
- Frontend Mic → Backend: 16kHz, 16-bit Mono PCM
- Backend → Google API: 16kHz, 16-bit Mono PCM (via Blob)
- Google API → Backend (Expected): 24kHz, 16-bit Mono PCM (raw bytes in response.data)
- Backend → Frontend: Raw bytes received from Google API via WebSocket (send_bytes).
- Frontend Playback Target: Configured for 24kHz, 16-bit Mono PCM input.

Problem:
When the backend receives response.data from the Gemini API stream and forwards these bytes to the iOS app, playing these bytes using AVAudioEngine / AVAudioPlayerNode results in loud static noise, not clear speech. This happens consistently on both the iOS Simulator and a physical iPhone.

What Works:

WebSocket connection between frontend and backend is stable.
Backend successfully connects to the Gemini Live API ( Entered Gemini Live session. logged).
System prompt is sent successfully via send_client_content.
User audio (16kHz Int16 PCM) is successfully captured by AudioManager, sent to the backend, and sent to Google via send_realtime_input without backend errors.
Backend receives binary data in response.data from the Gemini stream after user speaks.
Backend correctly sends audio_start, audio_end, and is_ai_speaking state updates to the frontend via WebSocket.
Frontend receives these state updates and the binary data chunks.
Frontend AudioPlayer setup does not crash with the latest configurations tried.

Debugging Steps Taken & Key Finding:

SDK Migration: Confirmed we are using the current google-genai SDK (v1.12.1), not the deprecated google-generativeai. Resolved initial import errors.
API Connection: Resolved various AttributeErrors and TypeErrors related to LiveConnectConfig and connection methods by simplifying the config and using manual context management (aenter/aexit) and eventually using send_realtime_input for audio. The connection is now stable.
Model Name: Confirmed gemini-1.5-flash-latest is rejected by the API for bidiGenerateContent. Switched to gemini-2.0-flash-live-001 (also tested gemini-1.5-pro-latest briefly - still resulted in static).
iOS AudioPlayer Implementation (Extensive Debugging):

Tried various AVAudioEngine graph setups (direct connection, intermediate mixer).
Tried multiple buffer creation/scheduling methods (scheduling Int16 buffers directly, using AVAudioConverter to create Float32 buffers matching processing format, using AVAudioConverter to create Float32 buffers resampled to hardware rate).
Ensured careful AVAudioSession configuration (.playAndRecord, .voiceChat, .mixWithOthers) and activation management.
Addressed multiple -10868 (FormatNotSupported) crashes related to node connections.
The final stable AudioPlayer uses AVAudioConverter to convert received Int16@24k data to Float32@HardwareRate buffers before scheduling. This eliminated crashes but the static remained.

Data Verification (CRITICAL FINDINGS):

Modified NetworkManager.swift to save the raw Data bytes received from the WebSocket directly to a .rawpcm file, bypassing AudioPlayer.
Imported this received_audio.rawpcm file into Audacity using the expected format parameters (Signed 16-bit PCM, Little-endian, 1 Channel (Mono), 24000 Hz Sample Rate).
Result: The audio played back in Audacity directly from the saved raw bytes is also static noise .

Backend Save (CRITICAL FINDING): Modified the backend main.py (receive_from_google function) to save the response.data directly received from the Gemini API stream to a .wav file on the server (using Python’s wave module, setting parameters for 1ch, 16-bit, 24kHz) before sending anything over the WebSocket.

Result: Playing this backend-saved .wav file directly also produced static noise.

Conclusion:
Since the raw bytes received by the frontend client, before being processed by AVAudioEngine, produce static when interpreted with the documented format settings, this strongly suggests the issue lies with the audio data being sent by the Gemini Live API itself via the AI Studio API key route with the tested models. The data does not appear to be clean 24kHz, 16-bit PCM.

The static audio issue persists even when saving the raw bytes directly on the backend immediately after receiving them from the Google API (response.data) and playing that file. This strongly indicates the problem originates from the Gemini Live API itself for the tested model (gemini-2.0-flash-live-001) when accessed via an AI Studio API key. The data being returned does not seem to be clean 24kHz, 16-bit PCM audio. The issue is not within the iOS audio playback code or WebSocket transmission.

Questions:

Has anyone else successfully received clear 24kHz, 16-bit PCM audio output from the Gemini Live API using the google-genai SDK (v1.x) with an AI Studio API Key (not Vertex AI)?
Is there a different, known-working model name compatible with AI Studio keys for the Live API audio output?
Could the audio data format being returned be different from the documented 24kHz, 16-bit, signed, little-endian PCM (e.g., different encoding, endianness, headers)?
Are there any specific configurations or flags needed in LiveConnectConfig (even if using a dictionary) or the initial connection for this specific model/API key combination that might affect audio output quality?

Any help or pointers would be greatly appreciated! We’ve hit a wall after resolving the connection and client-side playback issues.

Thanks!

Addie Design

Swapnendu2001 · May 4, 2025, 5:04pm

Hi,
In a similar situation here as well I have tried with similar sampling rates and pyaudio Int16 format but the output appears to be static noise. Earlier I tried an approach with Speech to Text and then sending the user prompt to the API endpoint which worked pretty well for en-us but in order to make it dynamic in terms of diverse language availability, I went for Gemini-2.0-live-001 which resulted in this. If you come across any solutions please share, would be really helpful.
thank you

AG_Nieve · May 5, 2025, 4:18am

I’ve got the same issue

Addie_Design · May 6, 2025, 1:15pm

@Siva_Sravana_Kumar_N @GUNAND_MAYANGLAMBAM

Any resolution folks? Would really appreciate if this gets fixed

GUNAND_MAYANGLAMBAM · May 7, 2025, 6:45am

Hi, I am following up with the team regarding this issue.

Thanks

GUNAND_MAYANGLAMBAM · May 7, 2025, 7:14am

By the way, did you check the Get_started_LiveAPI cookbook? Just wondering whether this is an API related issue or a compatibility problem with AVAudioEngine on iOS.

Addie_Design · May 7, 2025, 1:38pm

Hi,

Thanks for the suggestion about the cookbook – I’ll double-check it for any differences.

To clarify whether it might be an AVAudioEngine issue, we performed a test directly on the Python backend:

Inside the async for response in gemini_session.receive(): loop, we took the response.data bytes received directly from the Gemini Live API stream.
Before sending these bytes over the WebSocket to the iOS client, we saved them directly into a .wav file on the server using Python’s standard wave module (configured for 1 channel, 16-bit samples, 24000 Hz rate).
Playing this backend-saved .wav file directly on the server machine resulted in the same static noise .

This seems to indicate the issue lies with the raw audio data stream coming from the API itself (using models like gemini-2.0-flash-live-001 with an AI Studio key) rather than being an iOS playback problem.

Are there known issues with audio output quality for these models via the standard Live API endpoint/AI Studio keys, or is there a different recommended model known to provide clean 24kHz 16-bit PCM output?

Thanks again!

GUNAND_MAYANGLAMBAM · May 7, 2025, 1:50pm

Currently gemini-2.0-flash-live-001 is the only model that supports audio output.
I will follow up with you regarding the issue with model quality.

Addie_Design · May 26, 2025, 1:31pm

Hi @GUNAND_MAYANGLAMBAM

I just wanted to gently follow up on this. It’s been about three weeks, and the static audio issue with gemini-2.0-flash-live-001’s output still persists on our end. Has there been any update or new information from the team regarding this model’s audio quality via the Live API? Any alternative models or configurations we could try with an AI Studio key would also be greatly appreciated. Thanks!

GUNAND_MAYANGLAMBAM · May 27, 2025, 6:32am

Hey, just wanted to quickly let you know we have released a new model, gemini-2.5-flash-preview-native-audio-dialog , designed for improved audio quality. Could you give it a try and see if it resolves the static issue?

We’d appreciate your feedback.

Addie_Design · May 27, 2025, 1:17pm

hi @GUNAND_MAYANGLAMBAM

Thanks again for the update and for suggesting the gemini-2.5-flash-preview-native-audio-dialog model.

Following your advice, I configured my backend to use gemini-2.5-flash-preview-native-audio-dialog. I also tested the experimental model mentioned in the documentation, gemini-2.5-flash-exp-native-audio-thinking-dialog.

For both models, I performed the test where the backend saves the response.data (received directly from the Google API stream) into a .wav file (configured for 1 channel, 16-bit samples, 24kHz rate) before any transmission to an iOS client.

Unfortunately, for both gemini-2.5-flash-preview-native-audio-dialog and gemini-2.5-flash-exp-native-audio-thinking-dialog, the audio in the backend-saved .wav file still plays as static noise. We also observed this same static when attempting to play the audio (after appropriate client-side conversion and resampling) on both the iOS Simulator and physical iPhone devices.

This confirms that the static issue seems to originate from the audio data stream provided by the API itself for these models when accessed via an AI Studio API key from our region.

Is this static output a known issue or limitation with the current preview/experimental versions of these “native audio dialog” models?
Could this audio quality issue be specific to certain regions, such as Asia?
Are there any specific API configurations (beyond requesting audio modality) that are essential for these models to output clean audio, particularly when accessed via AI Studio keys?

Any further insights or recommendations would be very helpful.

Thanks,

Addie

Bhavya_Pandey · May 30, 2025, 6:28am

Hey @Addie_Design, do you still face the same issue? I am trying to do a POC and even I am getting static noise each time. So basically it is non functional now (I am using google-genai 1.17.0)
I first set it up using google-genai==0.3.0 and it was working. Any updates on this issue?

Topic		Replies	Views
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	12	690	June 12, 2025
Real time, gemini 2 audio change? how to? Gemini API models , audio	4	382	January 9, 2025
Gemini: A Constant Slew of Current Issues and Bugs. (Thoughts?) Gemini API gemini-15 , models	1	541	June 13, 2025
Gemini-1.5-flash is no longer processing audio files (500 Exception) - retry does not help Gemini API gemini-15 , bug , models , audio	4	91	April 9, 2025
Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25) Gemini API feedback , api , gemini-25 , gemini-2-5	7	440	June 4, 2025

Static Audio Output from Gemini Live API (google-genai SDK) on iOS with AVAudioEngine

Related topics