I am transcribing calls using Gemini following this page.
However the audio_timestamp parameter is said to be in Preview. The model is sometimes not identifying the correct speaker e.g. Speaker A’s caption may contain a sentence or two of Speaker B before the speaker changes in the transcript. I thought this parameter could help with that, is there any chance we can get access to it or if you have any ideas why the model would be misidentifying speakers or predicting timestamps incorrectly?
Hi Elisha, There are some limitations of these models currently, please check the last section of this doc. There might be some issue with the 002 model currently with audio transcribing which will be fixed in new releases, so I recommend using 001 for audio tasks as of now.
Regarding the “audioTimestamp” parameter, it’s still in preview, so please keep checking on new releases that will eliminate this issue of accurately generating timestamps.