There is Lag when using the MultiModal API from the open source code

When using Gemini Flash 2.0 Experimental, I experience this intermittent lag while receiving audio feedback from the model. It’s like a stutter that sometimes delays the response for 10s to a 30s. I am on a forked multimodal-live-api-web-console and am unsure how to improve the quality of the feedback

Their audio handling for the incoming chunks is very basic and the network conditions are really unstable while having inconsistent chunk sizes.

I am actually confused as to why they did not use WebRTC as the protocol to communicate with the Multimodal Live API endpoint. OpenAI also supports it now.

You can ask your coding LLM how to improve audio handling for the incoming chunks with more dynamic buffer and such to make the playback more robust.

The main file for this is audio-streamer.ts