Problems with Live API Audio Streaming and Function Responses

Hi everyone,

I’m experiencing some issues with the live API regarding voice functionality, and I’m hoping someone can help. I’m using the API for voice-related tasks and encountering two main problems:

  1. **Streaming Data with sendRealtimeInput:**When I send data via stream using the following function:JavaScriptawait this.session.sendRealtimeInput({ media: { data: buffer.toString('base64'), mimeType: 'audio/pcm;rate=16000' } });it doesn’t return anything. Neither voice output nor any error messages.
  2. **Function Calls and Response Modalities:**Sending text works perfectly. However, I’m facing a significant issue when trying to use it as an agent, specifically when it needs to call a function. Although I’ve set the response modalities to include both text and audio: responseModalities: [Modality.TEXT, Modality.AUDIO], it only returns audio when calling functions. Even though I have the response modalities set to text and audio, when it tries to call a function it only sends the information via audio. This makes it impossible for the agent to have a multimodal answer, because text answers are not posible, and even less invoking a function.Interestingly, when I connect directly via websockets (which I was using before the API), function calls work correctly.

Has anyone else encountered similar problems? Any insights or suggestions would be greatly appreciated.

Thanks in advance.