I want to use gemini-2-5-flash for transcribing audio files.
My files are 8kHz with 16 kbps. Before throwing them to the model, I do some preprocessing to the audios (separate only left channel + remove the silent parts from it). After the preprocessing I need to save those files, so I have some questions in what format should I save those modified files.
My questions are:
-
Should I resample audios from 8kHz to 16kHz? Does it make any difference to Gemini?
-
What bitrate should I use? In the documentation it is stated that Gemini downsamples audio files to a 16 Kbps data resolution. Does it mean it downsamples every input to 16 kbps bitrate or it just means it downsamples sample rate to 16kHz or it refers to the bit depth 16-bit PCM?
I find it kind of hard to believe it downsamples every input to 16 kbps bitrate.
Thanks