Hi folks!
Doing a lab for school, and I was wondering if anyone has had any luck getting Gemini-2.5 to do long-form audio and video (1hr to 3hr range) transcriptions that are diarized and timestamped.
Fairly unfamiliar with Gemini-2.5 but when I tried the March 03-25 model a couple months ago it seemed to be very promising. Has any one had any particular luck with certain prompts and system instructions? Also is enabling “Thinking” for the flash models any helpful?
Also wondering how folks are handling long form media and Gemini’s given context window, with chunking being the first thing that comes to my head, but I’m not sure if it’ll be able to retain the long form context of the speaker diarization if I do chunk.
Any help would be greatly appreciated!