Gemini live api issue multimodal

yoyoma · September 6, 2025, 12:05pm

I am trying to use the gemini live api to return both audio response and the transcribed audio as text. Why doesn’t this work?

Here is the code snippet. Basically I set responseModalities to both text and audio in the config but I just get this error: GeminiClient: Disconnected: Request contains an invalid argument.:

import { GoogleGenAI, Modality } from "@google/genai";

this.options = {
  model: "models/gemini-2.0-flash-live-001",
  ...options,
};

const config = {
  responseModalities: [Modality.AUDIO, Modality.TEXT],
  systemInstruction: this.options.instructions,
};

  this.session = await this.googleAI.live.connect({
    model: this.options.model,
    config: config,
    callbacks: {
      onopen: () => {
        console.log("✅ GeminiClient: WebSocket opened");
      },
      onmessage: (message: any) => {
        console.log("M");
      },
      onerror: (error: any) => {
        console.error("🚨 GeminiClient: error:", error);
      },
      onclose: (event: any) => {
        console.log("❌ GeminiClient: Disconnected: ", event.reason);
      },
    },
  });

According to docs this model can handle both text and audio in both input and output:

Topic		Replies	Views
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	887	July 22, 2025
outputAudioTranscription NOT WORKING WHEN [Modality.AUDIO] Gemini API api , models , gemini-flash	2	161	June 19, 2025
Received 1007 invalid payload using Gemini Live API Gemini API api , text	7	997	July 29, 2025
Why in Gemini Live API with Audio Modality its Transcription is not available in response Gemini API audio , live-streaming	5	204	August 15, 2025
Realtime Transcription in Multimodal Live API Gemini API ai-studio , fastapi	3	478	May 6, 2025

Gemini live api issue multimodal

Related topics