Gemini Live API: token generation suddenly stops

Hi @Lalit_Kumar,
It’s going much better, I’ve done entire sessions without any stop. But, anyway, it still happens.
Also, It doesn’t always recover. Additionally, recovery may overlap with Voice Activity Detection.
My use case is an assistant who provides customer service. It can happen that the assistant stops, the user asks if the assistant is still online, but in the meantime the assistant’s voice comes back and completes what it was saying. In the meantime the last user’s request if the assistant is still online has arrived, the assistant says he’s still online, etc… . The whole conversation becomes a mess. Maybe the cure is worse than the disease :slight_smile:

A sure source of stops is when the model has difficulty understanding what the user is saying. For example, in our application, the assistant might need to ask for the user’s first and last name and phone number. Once the user has provided them, the assistant must repeat them and ask for confirmation if they are correct.
In my tests I used my old landline number as a phone number, which was ....00.... (the international prefix is ​​omitted). The double zero 00 in the middle causes the model to have serious problems. When the assistant asks for confirmation, at least 70% of the time it repeats the number with an extra zero, a third zero, like .....00, and then stops. The user says I can’t hear you anymore and the assistant starts repeating the number with the same error, the third zero, and then stops. And this loop continues until the session is forcibly terminated. What surprised me is that, unlike other problems that are random, since AI is a statistical machine, this behavior is almost systematic.
Please note that our application is in Italian.
If you need it, I have the transcriptions and audio recordings of the sessions. They’re obviously in Italian. There are no privacy or GDPR concerns. We’re still in the testing phase.

The positive aspect of this update are:

  • The token generation suddenly stops issue is less frequent.
  • The quality of voice tone and speech has improved dramatically. It’s much less robotic now. It has much more imagination in finding words, and this makes it very similar to a human assistant. Is a huge step forward.
  • Function calls now work as expected, which is also a very welcome improvement.

Thank you for your valuable work.
Ciao

1 Like