Gemini Flash 2.5 transcription (gemini-2.5-flash-preview-04-17)

For no reason Gemini 2.5 Flash when asked to transcribe mp3 started spitting out: “This is a LibriVox recording. All LibriVox recordings are in the public domain. For more information, or to volunteer, please visit librivox org Read by Mary Ann Spiegel.” eating up LOTS of tokens (~ 170KB of text added).

Also lots of time it will output “<start_of_audio>” repeated, no reason in the prompt.

Hi @dsoudakov, Welcome to the forum!!

Are you using any audiobooks?? I just tried to transcribe an audiobook from LibriVox, it’s giving that output because it is there in the audio itself. Please see the below screenshot :

Also, i am not getting this “<start_of_audio>” token in the response. It will be helpful to share a screenshot to debug more.

Thanks

Hey!
I’m testing it on meeting recordings with a simple prompt: “Transcribe this file. Diarize the speakers. NO timestamps.” Which works ok most of the time, but as mentioned above outputs some strange stuff sometimes. MP3 files are uploaded in 60s chunks (had other issues with big files) and added to the prompt as per docs. Using python.

It will be really helpful if you can share few screen shots of the strange behavior.

Example from transcription output: <start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio><start_of_audio>1: 10:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

Thanks for sharing @dsoudakov, I will raise this with our engineering team.