There is Lag when using the MultiModal API from the open source code

James_Luberisse · December 30, 2024, 8:41pm

When using Gemini Flash 2.0 Experimental, I experience this intermittent lag while receiving audio feedback from the model. It’s like a stutter that sometimes delays the response for 10s to a 30s. I am on a forked multimodal-live-api-web-console and am unsure how to improve the quality of the feedback

romankupkovic · February 25, 2025, 1:46pm

Their audio handling for the incoming chunks is very basic and the network conditions are really unstable while having inconsistent chunk sizes.

I am actually confused as to why they did not use WebRTC as the protocol to communicate with the Multimodal Live API endpoint. OpenAI also supports it now.

You can ask your coding LLM how to improve audio handling for the incoming chunks with more dynamic buffer and such to make the playback more robust.

The main file for this is audio-streamer.ts

Topic		Replies	Views
Facing some serious lag in responses with Gemini 2.0 in audio modality in multimodal live API Gemini API api , gemini-20	1	92	April 1, 2025
Latency problems API gemini 2.0 flash multimodal life Gemini API api , audio , gemini-flash , gemini-20	2	81	March 25, 2025
Is there any near future plans to have native WebRTC support in the Gemini 2.0 flash live multimodal API servers? Gemini API api , feature_request	2	214	February 25, 2025
Reducing latency for gemini audio prompt requests? Gemini API prompt , audio	0	27	April 2, 2025
Gemini-2.0-exp multimodal live is not working Gemini API gemini-20	2	158	April 1, 2025

There is Lag when using the MultiModal API from the open source code

Related topics