I am impressed so far with the speed and capabilities of Flash 2.0 voice in/out (realtime) API. However it doesn’t seem to have very good voice activity detection, frequently interrupting the user. In addition, the voices are a little less natural-sounding (to my ear) than some competitors. Just providing feedback
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
There is Lag when using the MultiModal API from the open source code | 0 | 26 | December 30, 2024 | |
Best use case for Gemini 2.0 flash | 0 | 97 | December 15, 2024 | |
When will Gemini 2.0 be officially released? | 0 | 18 | January 15, 2025 | |
Question about Gemini 2.0 API | 2 | 75 | January 7, 2025 | |
Gemini Flash Experimental | 0 | 43 | December 26, 2024 |