Hi everyone,
I’m experiencing some issues with the live API regarding voice functionality, and I’m hoping someone can help. I’m using the API for voice-related tasks and encountering two main problems:
- **Streaming Data with
sendRealtimeInput
:**When I send data via stream using the following function:JavaScriptawait this.session.sendRealtimeInput({ media: { data: buffer.toString('base64'), mimeType: 'audio/pcm;rate=16000' } });
it doesn’t return anything. Neither voice output nor any error messages. - **Function Calls and Response Modalities:**Sending text works perfectly. However, I’m facing a significant issue when trying to use it as an agent, specifically when it needs to call a function. Although I’ve set the response modalities to include both text and audio:
responseModalities: [Modality.TEXT, Modality.AUDIO]
, it only returns audio when calling functions. Even though I have the response modalities set to text and audio, when it tries to call a function it only sends the information via audio. This makes it impossible for the agent to have a multimodal answer, because text answers are not posible, and even less invoking a function.Interestingly, when I connect directly via websockets (which I was using before the API), function calls work correctly.
Has anyone else encountered similar problems? Any insights or suggestions would be greatly appreciated.
Thanks in advance.