Gemini Live Not Responding Correctly to Text

Parivesh · April 20, 2025, 2:20pm

I’ve been building a prototype app that uses the Gemini multimodal live API. The use case involves feeding in video input and then asking text-based questions about the video.

It was working perfectly fine earlier, but now I’ve noticed an issue. Since the model changed from gemini-2.0-flash-experimental to gemini-2.0-flash-live-001, the API no longer responds to text questions about the video. It only seems to respond to audio-based queries now.

Is this a configuration issue on my end, or has something changed with the API behavior? I’d really appreciate your help

Govind_Keshari · April 21, 2025, 8:50am

Hi @Parivesh, Welcome to forum!!

What kind of error it’s throwing?? I don’t think there is any change in the model, it still supports video input as mentioned in the doc.

Parivesh · April 21, 2025, 9:04am

Hi @Govind_Keshari. Yes, it does support video input. However, there’s an issue I’ve noticed. When I stream a video as input and then try to ask questions about it using text queries, it responds saying I don’t have access to the video. But if I ask the same question using audio input instead of text, it provides the correct answer.

So, it seems like the video is being processed, but the system only responds properly to audio-based queries, not text-based ones. (This exact thing was working perfectly some time back)

Refer to this image:

Govind_Keshari · April 22, 2025, 11:16am

Are you following any doc or can you share your code if possible so i can repro the issue from my side. Any of the above will be helpful.

Parivesh · April 22, 2025, 1:00pm

Hi @Govind_Keshari. Here are the steps. Please let me know if something else is needed.

Reference Project

Repository: google-gemini/live-api-web-console

Setup Instructions

Install Node.js
Make sure Node.js is installed. If not, download and install it.
Install Dependencies
```
npm install
```
Start the Application
```
npm start
```
Set API Key
In the .env file in the project directory, add your Gemini API key:
```
REACT_APP_GEMINI_API_KEY="your-gemini-api-key"
```

Steps to Reproduce

Open your browser and go to:
http://localhost:3000
Click the “Stream” button.
Enable both Video and Audio.
On the left side of the screen, in the message input box, type a question related to what is visible in the video.
- The model will respond: “I don’t know”.
Now, ask the same question aloud using your voice (with audio enabled).
- The model will respond correctly this time.

Jack_Payne · April 28, 2025, 2:47am

Same experience using the Get_started_LiveAPI.py and using the Daily integration. Definitely something new

Shivam_Mishra · May 3, 2025, 3:45pm

Hi @Parivesh ,

I am able to follow the mentioned steps that you have provided. I am also able to reproduce the issue that you are facing. I have also noticed that once taking a voice input, it is able to identify the video by text as well, but directly with text, it is not taking inputs.

Thank you for raising this repository issue. I’ll update you soon if there is any update on this issue.

Topic		Replies	Views
Gemini Multimodal Live API Video Not Working Gemini API api , models	8	163	June 16, 2025
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	12	758	June 12, 2025
Gemini-2.5-flash api cannot process video input Gemini API gemini-flash , video	17	580	July 2, 2025
Gemini (using 1.5 flash) suddenly throwing invalid error argument for prompts with file uri input Google AI Studio gemini-15 , api , gemini-flash	14	957	June 19, 2025
Instructions are being ignored today Gemini API api , open-ai , gemini-flash-2-5	9	155	June 24, 2025

Gemini Live Not Responding Correctly to Text

Reference Project

Setup Instructions

Steps to Reproduce

Related topics