I have been trying to make timestamp generation (forced alignment) work with the Gemini models for some time, using audio files as input - the first model I am finding very consistently generating accurate responses is the latest 2.0-Pro experimental model (gemini-2.0-pro-exp-02-05) - other experimental models in the past would only work intermittently.
Requesting here that this capability be kept in the ultimate GA model - it is extremely useful in particular for certain non-English languages, where reliable forced alignment models do not exist. Would also mention that unfortunately, the 2.0-Flash model fails to generate reliable outputs for this task.