Transcribing calls with Gemini - labelling speakers wrong

I am transcribing calls using Gemini following this page.

However the audio_timestamp parameter is said to be in Preview. The model is sometimes not identifying the correct speaker e.g. Speaker A’s caption may contain a sentence or two of Speaker B before the speaker changes in the transcript. I thought this parameter could help with that, is there any chance we can get access to it or if you have any ideas why the model would be misidentifying speakers or predicting timestamps incorrectly?

Hi @elisha. Welcome to the forum.

Can you provide some info like which model you are using whether pro or flash and which version 001 or 002?

Hi Govind, I was using gemini-1.5-flash-002 and gemini-1.5-pro-002 and got the same results for both.

Hi Elisha, There are some limitations of these models currently, please check the last section of this doc. There might be some issue with the 002 model currently with audio transcribing which will be fixed in new releases, so I recommend using 001 for audio tasks as of now.

Regarding the “audioTimestamp” parameter, it’s still in preview, so please keep checking on new releases that will eliminate this issue of accurately generating timestamps.