Why is Gemini API Transcribing English into Other Languages?

prathamesh_mungekar · January 3, 2026, 3:21pm

I am using gemini realtime live api for conversation based application using gen ai sdk. After making ws connection when I am speaking in english, I am getting wrong transcritopn fot it, it is comnig in differernt languages

Sonali_Kumari1 · January 6, 2026, 8:38am

Hi @prathamesh_mungekar , Welcome to the AI Forum!!!

Could you please provide any steps to reproduce this issue along with relevant code snippets and output logs that demonstrate the issue?

prathamesh_mungekar · January 7, 2026, 11:36am

Here are the details regarding the issue. I am using the Gen AI SDK (JavaScript/React) with the gemini-2.0-flash-exp model via the Multimodal Live API (WebSocket).

The Issue:
Even though I am speaking clearly in English, the model frequently transcribes the input as other languages (e.g., Hindi, Welsh, or unrelated characters) and sometimes responds in those languages. This often happens when there is silence or slight background noise.

I attempted to force the input language by setting model: “en-US” inside inputAudioTranscription, but the API throws a validation error (see below).

Code Snippet:
Here is the configuration I am passing to client.live.connect.

codeJavaScript

sessionRef.current = await aiClientRef.current.live.connect({
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  config: {
    responseModalities: ["AUDIO"], // Using Modality.AUDIO
    systemInstruction: {
      parts: [{
        text: "You are an interviewer. You must listen and respond in English."
      }]
    },
    // The issue occurs regardless of tools, but here is the setup:
    tools: [{
        functionDeclarations: [{
            name: 'end_interview',
            description: 'Ends the interview session.',
            parameters: {
                type: 'object',
                properties: { reason: { type: 'string' } },
                required: ['reason']
            }
        }]
    }],
    
    // ATTEMPTED FIX:
    // When I leave this empty {}, it auto-detects (poorly).
    // When I try to set { model: "en-US" }, it crashes.
    inputAudioTranscription: {
       // model: "en-US" // <-- This causes Invalid JSON payload error
    },

    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: {
          voiceName: "Despina",
        },
      },
    },
    realtimeInputConfig: {
      automaticActivityDetection: {
        disabled: false,
        startOfSpeechSensitivity: "START_SENSITIVITY_LOW",
        endOfSpeechSensitivity: "END_SENSITIVITY_LOW",
        prefixPaddingMs: 20,
        silenceDurationMs: 3000,
      },
    },
  }
});

The Error:
When I try to define the model in inputAudioTranscription to fix the detection issue, I receive:

Invalid JSON payload received. Unknown name “model” at ‘setup.input_audio_transcription’: Cannot find field.

Steps to Reproduce:

Connect to the Live API using the config above.
Stream audio chunks from the browser microphone (I am using Int16Array PCM).
Speak a short English phrase or leave a moment of silence.
Observe the serverContent transcription events; they often switch to random languages instead of staying in English.

Is there a supported parameter to strictly enforce the Input Language for the Live API to prevent these hallucinations?

Thanks!

prathamesh_mungekar · January 8, 2026, 6:59am

@Sonali_Kumari1 any update on this topic ?

prathamesh_mungekar · January 8, 2026, 7:54am

for more clarity on this issue I just tried below steps. I read some random text para in english in my application. but it detectes in some different language, also it is not detecting full speech or not transcripting full speech until I stops.

sample text : During my internship, I focused on practical, production-level security measures to ensure safe and reliable integration between the frontend and RESTful APIs.

First, all API communication was enforced over HTTPS to protect data in transit. For protected endpoints, I worked with token-based authentication so only authenticated users could access sensitive resources. On the backend, each request went through proper authorization checks to ensure users could only access data permitted by their role.

I also paid close attention to input validation and sanitization on the server side to prevent common issues like injection attacks or malformed requests. Error handling was standardized so that APIs returned meaningful but non-sensitive error messages, avoiding accidental information leakage.

From the frontend side, I ensured that sensitive logic and credentials were never exposed, handled tokens securely, and implemented proper loading and error states to prevent unintended behavior. Overall, these considerations helped create a secure, predictable, and maintainable API integration.

transscription - "గ రి మై ఇం టర్ సె ప్ ట్ ఫో కస్ ఆన్ ప్రా క్టి కల్ ప్రొ డక్ష న్ లె వల్ సి క్యూ రిటీ మే జర్ టు ఇన్ షూ ర్ సే ఫ్ అండ్ రి లయ బుల్ ఇం టి గ్రే షన్ బి ట్వీన్ ద ఫ్ర ెంటె డ్ అండ్ రే ష పు ల్ పి ఫ స్ట్ ఆల్ ఏ పీ ఎస్ క మ్యూని కేషన్ వా స్ ఇన్ ఫో ర్స్ ఓవర్ హె చ్ టి పి ఎస్ టు ప్రొ టె క్ట్ డే టా ఇన్ ట్రాన్ స్ ఫర్ ప్రొ టె క్టెడ్ ఇన్ ఫ ండ్స్ వర్ క్ వి త్ కం డిషన్ వి త్ ఆ ఫ్టర్ డిఫె న్స్ టు టెంటి వ్ స్ యు హావ్ ఎక్స్ ఎం ప్ లా సెస్ ఫ్ర మ్ బి కాస్ ఇన్ ద ప ర్ చ ేంజ్ మ ె ంట్ సో మచ్ మే సో సి ం ప్ రెం డు రె ం డు రెండు ర ెండు సార్లు సో”

with the current issue I also found one more issue here. when I read this long para it is not giving me full stranscription on it or may be it is getting frizzed due to long text.

Sonali_Kumari1 · January 8, 2026, 10:45am

Hi @prathamesh_mungekar , Thanks for sharing code snippet along with reproducible steps.

As per the official documentation on Live API, inputAudioTranscription has no fields. Since you are using model: "en-US" , API does not recognize the field name model and hence the Invalid JSON payload error. To stop model from hallucinating and enforce the Input Language for the Live API, try setting explicit system instructions as they might help.

prathamesh_mungekar · January 8, 2026, 11:00am

@Sonali_Kumari1 yes I got it. I also tried with explicit system instruction but still it is not working. even I face this issue in Google ai studio it is not detection english lang perfectly.
and second issue if it is detected the lang then it is not providing me the full transcription og the message

prathamesh_mungekar · January 9, 2026, 6:18pm

@Sonali_Kumari1 any update ?

Sonali_Kumari1 · January 12, 2026, 8:00am

Hi @prathamesh_mungekar ,

While using explicit system instructions, please make sure that you are passing strict instructions in a positive way. Additionally, could you try disabling AutomaticActivityDetection in your config ? This will disable API’s auto detection, which might be causing language detection and cut-off transcriptions issues. You can manually pass ActivityStart and ActivityEnd events, these events will let you control the listening window.

Shrestha_Basu_Mallic · January 28, 2026, 8:31am

@prathamesh_mungekar did this work?

Urvesh_Rathod · January 29, 2026, 12:55pm

Hello all Is this issue resolved?
I am also facing same issue.

prathamesh_mungekar · January 30, 2026, 5:04pm

@Shrestha_Basu_Mallic @Urvesh_Rathod
no its not working. Not found any solution for this issue till date. Using some alternatives for gemini live api.

I think live api still is in preview so there is not much resources to fix this issue and support as well. sometimes it behave weirdly. I suggest you to try better alternative for your use case instead wasting your time on gemini live api as there are so many issues needs to fix in current version.

Justin_Walker · February 3, 2026, 6:46am

Same issue here… I have burned through the last several days trying to make it work, but it still hears nothing but gibberish from me.

Stefania_Druga · March 19, 2026, 3:33am

I am also facing this issue, are there any fixes in the pipeline?

Itai_Hochman · March 23, 2026, 7:31am

I have the same issue - any fix for it?

prathamesh_mungekar · March 23, 2026, 12:08pm

@Stefania_Druga @Itai_Hochman I gave up on this. used another tools.

Brannon_Lee · March 23, 2026, 4:52pm

@prathamesh_mungekar we have faced very similar issues. We have no current resolved. Even tried introducing Deep Gram in an attempt to correct the transcription as a fall back, but to no avail.

Mind sharing what you are planning on using instead?

Mustan_lokhand · March 23, 2026, 5:56pm

Hi

I am looking into this , could you share some more details

Which model you are using
Code snippets if possible ?

prathamesh_mungekar · March 24, 2026, 7:50am

in my case for implementing realtime conversation app with the transcription I used vapi . its paid tool but solved my problem.

prathamesh_mungekar · March 24, 2026, 7:51am

Topic		Replies	Views
Input transcription in gemini live api is very weird Google AI Studio live-streaming , gemini-2-5	3	254	March 24, 2026
Gemini Live Api gemini-2.5-flash-native-audio-preview-12-2025 Gemini API api , models , gemini , live-streaming	4	226	March 10, 2026
Gemini Live API models high Latency Gemini API api , models , gemini	11	787	December 11, 2025
How to use live-api websocket to translate text in real-time connection Gemini API gemini , audio	5	92	February 12, 2026
Gemini 2.5 Flash Transcriptions Gemini API api , models	2	279	October 14, 2025

Why is Gemini API Transcribing English into Other Languages?

Related topics