Gemini live api issue multimodal

I am trying to use the gemini live api to return both audio response and the transcribed audio as text. Why doesn’t this work?

Here is the code snippet. Basically I set responseModalities to both text and audio in the config but I just get this error: GeminiClient: Disconnected: Request contains an invalid argument.:

import { GoogleGenAI, Modality } from "@google/genai";

this.options = {
  model: "models/gemini-2.0-flash-live-001",
  ...options,
};

const config = {
  responseModalities: [Modality.AUDIO, Modality.TEXT],
  systemInstruction: this.options.instructions,
};

  this.session = await this.googleAI.live.connect({
    model: this.options.model,
    config: config,
    callbacks: {
      onopen: () => {
        console.log("✅ GeminiClient: WebSocket opened");
      },
      onmessage: (message: any) => {
        console.log("M");
      },
      onerror: (error: any) => {
        console.error("🚨 GeminiClient: error:", error);
      },
      onclose: (event: any) => {
        console.log("❌ GeminiClient: Disconnected: ", event.reason);
      },
    },
  });

According to docs this model can handle both text and audio in both input and output: