Transcribing calls with Gemini - labelling speakers wrong

elisha · October 21, 2024, 2:08pm

I am transcribing calls using Gemini following this page.

However the audio_timestamp parameter is said to be in Preview. The model is sometimes not identifying the correct speaker e.g. Speaker A’s caption may contain a sentence or two of Speaker B before the speaker changes in the transcript. I thought this parameter could help with that, is there any chance we can get access to it or if you have any ideas why the model would be misidentifying speakers or predicting timestamps incorrectly?

Govind_Keshari · October 24, 2024, 5:39am

Hi @elisha. Welcome to the forum.

Can you provide some info like which model you are using whether pro or flash and which version 001 or 002?

elisha · October 24, 2024, 9:33am

Hi Govind, I was using gemini-1.5-flash-002 and gemini-1.5-pro-002 and got the same results for both.

Govind_Keshari · October 25, 2024, 5:59am

Hi Elisha, There are some limitations of these models currently, please check the last section of this doc. There might be some issue with the 002 model currently with audio transcribing which will be fixed in new releases, so I recommend using 001 for audio tasks as of now.

Regarding the “audioTimestamp” parameter, it’s still in preview, so please keep checking on new releases that will eliminate this issue of accurately generating timestamps.

Topic		Replies	Views
Gemini Pro Timestamp Accuracy Issues in Audio Transcription Gemini API gemini-15 , api	9	772	March 27, 2025
Gemini Flash 2.0 audio transcription timestamps incorrect Gemini API audio	4	777	March 27, 2025
Call to update documentation for Audio Understanding (Refer to timestamps) Gemini API audio , gemini-20 , documentation	1	105	May 31, 2025
Audio timestamp accuracy issue in Gemini 2.0 GA models Gemini API help_request , gemini-20	0	335	March 14, 2025
Speaker Diarization Gemini API audio	1	62	October 15, 2025

Transcribing calls with Gemini - labelling speakers wrong

Related topics