I have integrated the Gemini Live API via WebSocket in my backend. However, I am noticing that the responses from the Gemini Live API are consistently slow — approximately 2–3 seconds delay before receiving output.
Could you please help me understand the reason for this delay and suggest how it can be improved?
1 Like
Hi @Akshay_Kumar_Rajput
welcome to the AI Forum!!
It mainly comes from detecting the end of input, network transmission, and model processing. To reduce it, you can optimize input chunking, reuse persistent WebSocket sessions, and use lighter models or shorter contexts where possible.
Thanks!
I have the same problem. But the delay is even higher, at around 5-10 seconds, and it gets worse the longer the conversation goes. I think the main cause for me is the End of Turn detection. for some reason it takes a lot of time to detect my speech ending and sending it to the server. Because, when I send a message directly via text, the response is instant.
Facing the same issue
@Shivam_Singh2 pls take a look into this
Hii @Abivarman & @Sadikh_Shaik
Apologies for the delayed response. To assist you further, could you please confirm whether you are still encountering delays related to End-of-Turn detection?
@Shivam_Singh2
It has gotten better, but still not as “instant” as it is supposed to be compared to something like OpenAI’s live models. It also struggles with voice detection while it’s talking
Hello,
Could you please let us know which model you are using when experiencing this response delay issue?
gemini-2.5-flash-native-audio-preview-09-2025
I observe the first inference after connect audio chunk is very slow(5-15sec) , while it works good enough for subsequent responses (2-3sec).
I’m usually setting the context and starting from session_resumption_handle
Is there anything I could do about the first inference?
example in a first turn
TTFB: 2.56sec till text message
audio message 5.442s
Hello,
Could you please share the full payload details along with a sample of the code that you are using? We would like to reproduce the issue.