Speaker Diarized and Timestamped Transcription with Gemini

Danny_Simpson · August 16, 2025, 9:26pm

Hi folks!

Doing a lab for school, and I was wondering if anyone has had any luck getting Gemini-2.5 to do long-form audio and video (1hr to 3hr range) transcriptions that are diarized and timestamped.

Fairly unfamiliar with Gemini-2.5 but when I tried the March 03-25 model a couple months ago it seemed to be very promising. Has any one had any particular luck with certain prompts and system instructions? Also is enabling “Thinking” for the flash models any helpful?

Also wondering how folks are handling long form media and Gemini’s given context window, with chunking being the first thing that comes to my head, but I’m not sure if it’ll be able to retain the long form context of the speaker diarization if I do chunk.

Any help would be greatly appreciated!

Lalit_Kumar · August 18, 2025, 6:33am

Hello,

Welcome to the Forum,

Could you please share a bit more detail about your goals, what exactly you are expecting to achieve, and which model (flash/pro) you are currently using?

Danny_Simpson · August 18, 2025, 6:45am

Hello Lalit,

I’m trying to generate a diarized and timestamped transcript for audio clips that are approximately 60 to 180 minutes long. I want it to be utterance level, as in I don’t need word level accuracy.

Ideally I want to try to use the flash model, but I’ve only gotten sub-par output with the flash model.

Lalit_Kumar · August 19, 2025, 9:24am

I would recommend going through audio understanding doc and video understanding doc in Gemini API documentation.

Topic		Replies	Views
How to get consistent Multi-Speaker Transcription output from Gemini 2.5 Pro? Gemini API api , audio , gemini-25	2	523	August 29, 2025
Transcribing calls with Gemini - labelling speakers wrong Gemini API gemini	3	287	October 25, 2024
Speaker Diarization Gemini API audio	1	145	October 15, 2025
Gemini Pro Timestamp Accuracy Issues in Audio Transcription Gemini API gemini-15 , api	9	1020	March 27, 2025
Analysong long video (>1 hour) Gemini API api , models , gemini-api , gemini , gemini-3	0	35	April 25, 2026

Speaker Diarized and Timestamped Transcription with Gemini

Related topics