Timestamp Generation (Forced Alignment) on 2.0-Pro-Exp

I have been trying to make timestamp generation (forced alignment) work with the Gemini models for some time, using audio files as input - the first model I am finding very consistently generating accurate responses is the latest 2.0-Pro experimental model (gemini-2.0-pro-exp-02-05) - other experimental models in the past would only work intermittently.

Requesting here that this capability be kept in the ultimate GA model - it is extremely useful in particular for certain non-English languages, where reliable forced alignment models do not exist. Would also mention that unfortunately, the 2.0-Flash model fails to generate reliable outputs for this task.

3 Likes

I have been tracking this as well, and in my experience these models all produce accurate timestamps, while the base Flash 2.0 does not.

  • 2.0 Flash Lite
  • 2.0 Flash thinking
  • 2.0 Pro
2 Likes

Hi there,

Does this apply to audio processing only or is your experience/mileage regarding videos similar?

Cheers.

Now the GA Flash Lite is breaking for timestamp generation, and the experimental model was working well. Quite frustrating.

If anyone from GOOGL is watching - would be great to understand if timestamps are supposed to be supported, and why it breaks on the new GA model.

2 Likes

I have not tested it on video, so I don’t know.

Confirmed on my end as well. The current Flash Lite timestamps are complete crap. Not even close to being accurate.

2 Likes