Multimodal Live API - first interaction issues, no response, hallucination ect

Peter_McGrath · March 8, 2025, 3:46pm

Hey guys,

I’ve built an ios app with the gemini multimodal live api that uses video+audio input. I’ve been having a persistent issue for about a month that I can’t solve. When the user connects to the bot, the first interaction is off about 50% of the time. Sometimes the bot never responds, sometimes it interrupts itself or hallucinates a user response. It may be an internal Gemini VAD issue, i’m not sure.

Here’s an example from the server logs of it interrupting itself:


`2025-03-08T15:44:10.004 app[d891152c70d638] ord [info] 2025-03-08 15:44:10.003 | DEBUG | pipecat.transports.base_input:_handle_interruptions:124 - User started speaking

2025-03-08T15:44:11.645 app[d891152c70d638] ord [info] 2025-03-08 15:44:11.644 | DEBUG | pipecat.transports.base_input:_handle_interruptions:131 - User stopped speaking

2025-03-08T15:44:12.290 app[d891152c70d638] ord [info] 2025-03-08 15:44:12.289 | DEBUG | pipecat.services.gemini_multimodal_live.gemini:_handle_transcribe_user_audio:270 - [Transcription:user] Hey, what's in this mug?

2025-03-08T15:44:13.291 app[d891152c70d638] ord [info] 2025-03-08 15:44:13.291 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:203 - Bot started speaking

2025-03-08T15:44:14.584 app[d891152c70d638] ord [info] 2025-03-08 15:44:14.583 | DEBUG | pipecat.transports.base_input:_handle_interruptions:124 - User started speaking

2025-03-08T15:44:14.585 app[d891152c70d638] ord [info] 2025-03-08 15:44:14.584 | DEBUG | pipecat.transports.base_output:_bot_stopped_speaking:210 - Bot stopped speaking

2025-03-08T15:44:15.224 app[d891152c70d638] ord [info] 2025-03-08 15:44:15.223 | DEBUG | pipecat.transports.base_input:_handle_interruptions:131 - User stopped speaking

2025-03-08T15:44:15.664 app[d891152c70d638] ord [info] 2025-03-08 15:44:15.664 | DEBUG | pipecat.services.gemini_multimodal_live.gemini:_handle_transcribe_user_audio:270 - [Transcription:user] Earl Grey tea.

2025-03-08T15:44:19.688 app[d891152c70d638] ord [info] 2025-03-08 15:44:19.686 | DEBUG | pipecat.services.gemini_multimodal_live.gemini:_handle_transcribe_model_audio:278 - [Transcription:model] Based on the color, it looks like there is coffee in the mug. Would you like me to search for any coffee recipes?

Peter_McGrath · March 8, 2025, 3:47pm

In that example. The bot started speaking, interrupted itself, and then went silent. However the output and transcript came through, two times with different outputs. Neither were spoken.

Peter_McGrath · March 19, 2025, 10:27pm

anybody have a similar problem? still not solved.

Pannaga_J · June 16, 2025, 6:19am

Hi @Peter_McGrath Apologies for late response .
It’s been a while . If you are still facing the issue can you confirm if you have checked
VAD configuration within your pipecat setup. Have you experimented with less aggressive thresholds for detecting speech? This is might be an case of Overly sensitive VAD.
Thank you

Topic		Replies	Views
Gemini Live API: token generation suddenly stops Gemini API ai-studio , api , audio , live-streaming	7	191	July 25, 2025
Disable interruptions for audio streaming for multimodal live api Gemini API api	5	421	June 24, 2025
How do I prevent the Live API from discarding audio when it's given audio while it speaks? Gemini API api , gemini-api	10	242	June 24, 2025
Interrupting Gemini 2 Flash Multimodal Live API seem not to work as expected Gemini API gemini-flash	1	323	June 16, 2025
Gemini Multimodal Live API Video Not Working Gemini API api , models	8	165	June 16, 2025

Multimodal Live API - first interaction issues, no response, hallucination ect

Related topics