Why is Gemini API Transcribing English into Other Languages?

prathamesh_mungekar · January 7, 2026, 11:36am

Here are the details regarding the issue. I am using the Gen AI SDK (JavaScript/React) with the gemini-2.0-flash-exp model via the Multimodal Live API (WebSocket).

The Issue:
Even though I am speaking clearly in English, the model frequently transcribes the input as other languages (e.g., Hindi, Welsh, or unrelated characters) and sometimes responds in those languages. This often happens when there is silence or slight background noise.

I attempted to force the input language by setting model: “en-US” inside inputAudioTranscription, but the API throws a validation error (see below).

Code Snippet:
Here is the configuration I am passing to client.live.connect.

codeJavaScript

sessionRef.current = await aiClientRef.current.live.connect({
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  config: {
    responseModalities: ["AUDIO"], // Using Modality.AUDIO
    systemInstruction: {
      parts: [{
        text: "You are an interviewer. You must listen and respond in English."
      }]
    },
    // The issue occurs regardless of tools, but here is the setup:
    tools: [{
        functionDeclarations: [{
            name: 'end_interview',
            description: 'Ends the interview session.',
            parameters: {
                type: 'object',
                properties: { reason: { type: 'string' } },
                required: ['reason']
            }
        }]
    }],
    
    // ATTEMPTED FIX:
    // When I leave this empty {}, it auto-detects (poorly).
    // When I try to set { model: "en-US" }, it crashes.
    inputAudioTranscription: {
       // model: "en-US" // <-- This causes Invalid JSON payload error
    },

    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: {
          voiceName: "Despina",
        },
      },
    },
    realtimeInputConfig: {
      automaticActivityDetection: {
        disabled: false,
        startOfSpeechSensitivity: "START_SENSITIVITY_LOW",
        endOfSpeechSensitivity: "END_SENSITIVITY_LOW",
        prefixPaddingMs: 20,
        silenceDurationMs: 3000,
      },
    },
  }
});

The Error:
When I try to define the model in inputAudioTranscription to fix the detection issue, I receive:

Invalid JSON payload received. Unknown name “model” at ‘setup.input_audio_transcription’: Cannot find field.

Steps to Reproduce:

Connect to the Live API using the config above.
Stream audio chunks from the browser microphone (I am using Int16Array PCM).
Speak a short English phrase or leave a moment of silence.
Observe the serverContent transcription events; they often switch to random languages instead of staying in English.

Is there a supported parameter to strictly enforce the Input Language for the Live API to prevent these hallucinations?

Thanks!

Topic		Replies	Views
Input transcription in gemini live api is very weird Google AI Studio live-streaming , gemini-2-5	3	255	March 24, 2026
Gemini Live Api gemini-2.5-flash-native-audio-preview-12-2025 Gemini API api , models , gemini , live-streaming	4	226	March 10, 2026
Gemini Live API models high Latency Gemini API api , models , gemini	11	787	December 11, 2025
How to use live-api websocket to translate text in real-time connection Gemini API gemini , audio	5	93	February 12, 2026
Gemini 2.5 Flash Transcriptions Gemini API api , models	2	279	October 14, 2025

Why is Gemini API Transcribing English into Other Languages?

Related topics