Hey guys,
I’ve built an ios app with the gemini multimodal live api that uses video+audio input. I’ve been having a persistent issue for about a month that I can’t solve. When the user connects to the bot, the first interaction is off about 50% of the time. Sometimes the bot never responds, sometimes it interrupts itself or hallucinates a user response. It may be an internal Gemini VAD issue, i’m not sure.
Here’s an example from the server logs of it interrupting itself:
`2025-03-08T15:44:10.004 app[d891152c70d638] ord [info] 2025-03-08 15:44:10.003 | DEBUG | pipecat.transports.base_input:_handle_interruptions:124 - User started speaking
2025-03-08T15:44:11.645 app[d891152c70d638] ord [info] 2025-03-08 15:44:11.644 | DEBUG | pipecat.transports.base_input:_handle_interruptions:131 - User stopped speaking
2025-03-08T15:44:12.290 app[d891152c70d638] ord [info] 2025-03-08 15:44:12.289 | DEBUG | pipecat.services.gemini_multimodal_live.gemini:_handle_transcribe_user_audio:270 - [Transcription:user] Hey, what's in this mug?
2025-03-08T15:44:13.291 app[d891152c70d638] ord [info] 2025-03-08 15:44:13.291 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:203 - Bot started speaking
2025-03-08T15:44:14.584 app[d891152c70d638] ord [info] 2025-03-08 15:44:14.583 | DEBUG | pipecat.transports.base_input:_handle_interruptions:124 - User started speaking
2025-03-08T15:44:14.585 app[d891152c70d638] ord [info] 2025-03-08 15:44:14.584 | DEBUG | pipecat.transports.base_output:_bot_stopped_speaking:210 - Bot stopped speaking
2025-03-08T15:44:15.224 app[d891152c70d638] ord [info] 2025-03-08 15:44:15.223 | DEBUG | pipecat.transports.base_input:_handle_interruptions:131 - User stopped speaking
2025-03-08T15:44:15.664 app[d891152c70d638] ord [info] 2025-03-08 15:44:15.664 | DEBUG | pipecat.services.gemini_multimodal_live.gemini:_handle_transcribe_user_audio:270 - [Transcription:user] Earl Grey tea.
2025-03-08T15:44:19.688 app[d891152c70d638] ord [info] 2025-03-08 15:44:19.686 | DEBUG | pipecat.services.gemini_multimodal_live.gemini:_handle_transcribe_model_audio:278 - [Transcription:model] Based on the color, it looks like there is coffee in the mug. Would you like me to search for any coffee recipes?