I am trying to use the gemini live api to return both audio response and the transcribed audio as text. Why doesn’t this work?
Here is the code snippet. Basically I set responseModalities to both text and audio in the config but I just get this error: GeminiClient: Disconnected: Request contains an invalid argument.:
import { GoogleGenAI, Modality } from "@google/genai";
this.options = {
model: "models/gemini-2.0-flash-live-001",
...options,
};
const config = {
responseModalities: [Modality.AUDIO, Modality.TEXT],
systemInstruction: this.options.instructions,
};
this.session = await this.googleAI.live.connect({
model: this.options.model,
config: config,
callbacks: {
onopen: () => {
console.log("✅ GeminiClient: WebSocket opened");
},
onmessage: (message: any) => {
console.log("M");
},
onerror: (error: any) => {
console.error("🚨 GeminiClient: error:", error);
},
onclose: (event: any) => {
console.log("❌ GeminiClient: Disconnected: ", event.reason);
},
},
});
According to docs this model can handle both text and audio in both input and output: