Gemini Flash 2.0 Experimental VAD is pretty bad

I am impressed so far with the speed and capabilities of Flash 2.0 voice in/out (realtime) API. However it doesn’t seem to have very good voice activity detection, frequently interrupting the user. In addition, the voices are a little less natural-sounding (to my ear) than some competitors. Just providing feedback

1 Like

Hi @Dirk_Coetsee,

Thanks for the feedback and apologies for the late reply. But, feedback was in consideration to our engineering team. VAD is not an issue now. Even more powerful model 2.5 is there, please have a try. You can also disable VAD. Also, new voices are added (it’s now around 8).