How to bias input_audio_transcription with a prompt in the Gemini Live API?

Situation

I want the ASR to recognize domain-specific terms in our meetings.

  • What I did: passed a prompt (custom vocabulary + context) via System Instruction.
  • Issue: those terms are still mistranscribed, so the prompt seems to be ignored.

Questions

  1. Is there an official way to supply a prompt, vocabulary list, or “hints” that truly affects input_audio_transcription?
  2. If not, what workaround does Google recommend?
  3. Does the recognizer run before any prompt conditioning, making System Instructions ineffective for ASR?

Hi @bibitto,

Welcome to the Google AI Forum! :confetti_ball: :confetti_ball:

Gemini’s input_audio_transcription does not support prompt conditioning or biasing via System Instructions in a reliable way. The transcription (ASR) is run independently before any prompt or context is processed.

In order to implement this solution specific to your use-case, please try speech-to-text API first and feed the transcribed text into Gemini..

1 Like