Live API Audio Talk Worse After the Update

The live audio speaking model has became significantly worse after the latest update to gemini-2.0-flash-live-001. The old gemini-2.0-flash-exp seems to be still callable with the api but the results sound the same as with the new live-001.

The main problems are:

  1. Often the last word of the spoken phrase is cut off before it’s finished.
  2. The overall naturalness of the spoken sentences is not as good, because many times the words are hurried and bunched up together.

These don’t happen every time but with the old model they never happened.

I can understand that these changes might have come as a result to make the talking model more responsive or faster but the results are worse.

I hope we can have the option to use the old model or get these fixed!

I just switched from exp to live-001 and also noticed it’s worse, seems strange since I though this was production ready soon.

Seems like Google might have switched the speaking model from what it was previously to Sesame Labs CSM which exhibits same kind of bugs and inconsistencies as the Google model currently does.

Eg. the last words/phonemes inconsistently cut off:

The question is, why would Google change the speaking model from something that was working to something worse? Is the CSM cheaper maybe?