I have integrated the Gemini Live API via WebSocket in my backend. However, I am noticing that the responses from the Gemini Live API are consistently slow — approximately 2–3 seconds delay before receiving output.
Could you please help me understand the reason for this delay and suggest how it can be improved?
1 Like
Hi @Akshay_Kumar_Rajput
welcome to the AI Forum!!
It mainly comes from detecting the end of input, network transmission, and model processing. To reduce it, you can optimize input chunking, reuse persistent WebSocket sessions, and use lighter models or shorter contexts where possible.
Thanks!
I have the same problem. But the delay is even higher, at around 5-10 seconds, and it gets worse the longer the conversation goes. I think the main cause for me is the End of Turn detection. for some reason it takes a lot of time to detect my speech ending and sending it to the server. Because, when I send a message directly via text, the response is instant.
Facing the same issue
@Shivam_Singh2 pls take a look into this