Problems with Live API Audio Streaming and Function Responses

Xavier_Massana · March 30, 2025, 5:36pm

Hi everyone,

I’m experiencing some issues with the live API regarding voice functionality, and I’m hoping someone can help. I’m using the API for voice-related tasks and encountering two main problems:

**Streaming Data with sendRealtimeInput:**When I send data via stream using the following function:JavaScriptawait this.session.sendRealtimeInput({ media: { data: buffer.toString('base64'), mimeType: 'audio/pcm;rate=16000' } });it doesn’t return anything. Neither voice output nor any error messages.
**Function Calls and Response Modalities:**Sending text works perfectly. However, I’m facing a significant issue when trying to use it as an agent, specifically when it needs to call a function. Although I’ve set the response modalities to include both text and audio: responseModalities: [Modality.TEXT, Modality.AUDIO], it only returns audio when calling functions. Even though I have the response modalities set to text and audio, when it tries to call a function it only sends the information via audio. This makes it impossible for the agent to have a multimodal answer, because text answers are not posible, and even less invoking a function.Interestingly, when I connect directly via websockets (which I was using before the API), function calls work correctly.

Has anyone else encountered similar problems? Any insights or suggestions would be greatly appreciated.

Thanks in advance.

phanmemkhoinghiep · June 12, 2025, 5:54pm

Hello, pls see this git GitHub - ontaptom/multimodal-live-api

Mrinal_Ghosh · June 23, 2025, 9:40am

Hi @Xavier_Mas

To help us understand the issue , could you please share a code snippet of what you’ve tried so far?

Thanks!

Topic		Replies	Views
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	12	723	June 12, 2025
Function Calling Multimodal Live API Google AI Studio api	2	241	June 20, 2025
Facing some serious lag in responses with Gemini 2.0 in audio modality in multimodal live API Gemini API api , gemini-20	2	162	June 11, 2025
Received 1007 invalid payload using Gemini Live API Gemini API api , text	6	316	June 19, 2025
Gemini Live Not Responding Correctly to Text Gemini API api , models	6	227	May 3, 2025

Problems with Live API Audio Streaming and Function Responses

Related topics