Gemini Live API models 'inputTranscription' hallucinations

Hi everyone,

We’re currently testing a custom Gemini Live model on Vertex AI, and we’ve noticed an issue where the model sometimes generates responses without receiving any actual input.

After some debugging, it looks like this behavior is caused by random input transcription detections that don’t come from the user. These unexpected transcriptions usually follow the same pattern:

  • Two transcriptions with unrecognizable content (literally random),

  • Followed by one transcription with content "None" or whatever.

From our microservice backend we are handling explicitly the turns and input sending to Live API and logging on the terminal the VAD detections. When this random issue happens NO VAD logging is shown, so that it indicates that it is a Google API error/problem. We haven’t been able to find a clear cause, and we suspect it could be related either to the API itself or to our model deployment setup.

Below is an example trace of one of these cases: you can see the setupComplete event (the first session setup) immediately followed by an unexpected inputTranscription event.

:magnifying_glass_tilted_left: Example log summary (system running in Spanish)

Below is a simplified extract from our logs that shows the issue:

  • The session starts normally:
    setupComplete (session established)
    Twilio WebSocket connected
    6 clients loaded, TTS ready
  • Immediately after setup, the API sends:
    inputTranscription: “या”
    inputTranscription: “, 2, 3, 4, 5, 6,”
    inputTranscription: " 7, 8 9, 10, 11,"
    inputTranscription: " 12"
    inputTranscription: None
  • The model then generates an unexpected response
    modelTurn: “Usted disculpe, no le he entendido muy bien. ¿Podría, por favor, repetir lo que dijo?”

Has anyone experienced something similar or knows what could be happening?
Any help or pointers from the Vertex AI / Gemini team would be greatly appreciated!

Thanks in advance :folded_hands: