I am using gemini realtime live api for conversation based application using gen ai sdk. After making ws connection when I am speaking in english, I am getting wrong transcritopn fot it, it is comnig in differernt languages
Hi @prathamesh_mungekar , Welcome to the AI Forum!!!
Could you please provide any steps to reproduce this issue along with relevant code snippets and output logs that demonstrate the issue?
Here are the details regarding the issue. I am using the Gen AI SDK (JavaScript/React) with the gemini-2.0-flash-exp model via the Multimodal Live API (WebSocket).
The Issue:
Even though I am speaking clearly in English, the model frequently transcribes the input as other languages (e.g., Hindi, Welsh, or unrelated characters) and sometimes responds in those languages. This often happens when there is silence or slight background noise.
I attempted to force the input language by setting model: “en-US” inside inputAudioTranscription, but the API throws a validation error (see below).
Code Snippet:
Here is the configuration I am passing to client.live.connect.
codeJavaScript
sessionRef.current = await aiClientRef.current.live.connect({
model: "gemini-2.5-flash-native-audio-preview-12-2025",
config: {
responseModalities: ["AUDIO"], // Using Modality.AUDIO
systemInstruction: {
parts: [{
text: "You are an interviewer. You must listen and respond in English."
}]
},
// The issue occurs regardless of tools, but here is the setup:
tools: [{
functionDeclarations: [{
name: 'end_interview',
description: 'Ends the interview session.',
parameters: {
type: 'object',
properties: { reason: { type: 'string' } },
required: ['reason']
}
}]
}],
// ATTEMPTED FIX:
// When I leave this empty {}, it auto-detects (poorly).
// When I try to set { model: "en-US" }, it crashes.
inputAudioTranscription: {
// model: "en-US" // <-- This causes Invalid JSON payload error
},
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: {
voiceName: "Despina",
},
},
},
realtimeInputConfig: {
automaticActivityDetection: {
disabled: false,
startOfSpeechSensitivity: "START_SENSITIVITY_LOW",
endOfSpeechSensitivity: "END_SENSITIVITY_LOW",
prefixPaddingMs: 20,
silenceDurationMs: 3000,
},
},
}
});
The Error:
When I try to define the model in inputAudioTranscription to fix the detection issue, I receive:
Invalid JSON payload received. Unknown name “model” at ‘setup.input_audio_transcription’: Cannot find field.
Steps to Reproduce:
-
Connect to the Live API using the config above.
-
Stream audio chunks from the browser microphone (I am using Int16Array PCM).
-
Speak a short English phrase or leave a moment of silence.
-
Observe the serverContent transcription events; they often switch to random languages instead of staying in English.
Is there a supported parameter to strictly enforce the Input Language for the Live API to prevent these hallucinations?
Thanks!
@Sonali_Kumari1 any update on this topic ?
for more clarity on this issue I just tried below steps. I read some random text para in english in my application. but it detectes in some different language, also it is not detecting full speech or not transcripting full speech until I stops.
sample text : During my internship, I focused on practical, production-level security measures to ensure safe and reliable integration between the frontend and RESTful APIs.
First, all API communication was enforced over HTTPS to protect data in transit. For protected endpoints, I worked with token-based authentication so only authenticated users could access sensitive resources. On the backend, each request went through proper authorization checks to ensure users could only access data permitted by their role.
I also paid close attention to input validation and sanitization on the server side to prevent common issues like injection attacks or malformed requests. Error handling was standardized so that APIs returned meaningful but non-sensitive error messages, avoiding accidental information leakage.
From the frontend side, I ensured that sensitive logic and credentials were never exposed, handled tokens securely, and implemented proper loading and error states to prevent unintended behavior. Overall, these considerations helped create a secure, predictable, and maintainable API integration.
transscription - "గ రి మై ఇం టర్ సె ప్ ట్ ఫో కస్ ఆన్ ప్రా క్టి కల్ ప్రొ డక్ష న్ లె వల్ సి క్యూ రిటీ మే జర్ టు ఇన్ షూ ర్ సే ఫ్ అండ్ రి లయ బుల్ ఇం టి గ్రే షన్ బి ట్వీన్ ద ఫ్ర ెంటె డ్ అండ్ రే ష పు ల్ పి ఫ స్ట్ ఆల్ ఏ పీ ఎస్ క మ్యూని కేషన్ వా స్ ఇన్ ఫో ర్స్ ఓవర్ హె చ్ టి పి ఎస్ టు ప్రొ టె క్ట్ డే టా ఇన్ ట్రాన్ స్ ఫర్ ప్రొ టె క్టెడ్ ఇన్ ఫ ండ్స్ వర్ క్ వి త్ కం డిషన్ వి త్ ఆ ఫ్టర్ డిఫె న్స్ టు టెంటి వ్ స్ యు హావ్ ఎక్స్ ఎం ప్ లా సెస్ ఫ్ర మ్ బి కాస్ ఇన్ ద ప ర్ చ ేంజ్ మ ె ంట్ సో మచ్ మే సో సి ం ప్ రెం డు రె ం డు రెండు ర ెండు సార్లు సో”
- with the current issue I also found one more issue here. when I read this long para it is not giving me full stranscription on it or may be it is getting frizzed due to long text.
Hi @prathamesh_mungekar , Thanks for sharing code snippet along with reproducible steps.
As per the official documentation on Live API, inputAudioTranscription has no fields. Since you are using model: "en-US" , API does not recognize the field name model and hence the Invalid JSON payload error. To stop model from hallucinating and enforce the Input Language for the Live API, try setting explicit system instructions as they might help.
@Sonali_Kumari1 yes I got it. I also tried with explicit system instruction but still it is not working. even I face this issue in Google ai studio it is not detection english lang perfectly.
and second issue if it is detected the lang then it is not providing me the full transcription og the message
@Sonali_Kumari1 any update ?
Hi @prathamesh_mungekar ,
While using explicit system instructions, please make sure that you are passing strict instructions in a positive way. Additionally, could you try disabling AutomaticActivityDetection in your config ? This will disable API’s auto detection, which might be causing language detection and cut-off transcriptions issues. You can manually pass ActivityStart and ActivityEnd events, these events will let you control the listening window.
@prathamesh_mungekar did this work?
Hello all Is this issue resolved?
I am also facing same issue.
@Shrestha_Basu_Mallic @Urvesh_Rathod
no its not working. Not found any solution for this issue till date. Using some alternatives for gemini live api.
I think live api still is in preview so there is not much resources to fix this issue and support as well. sometimes it behave weirdly. I suggest you to try better alternative for your use case instead wasting your time on gemini live api as there are so many issues needs to fix in current version.
Same issue here… I have burned through the last several days trying to make it work, but it still hears nothing but gibberish from me.