Audio timestamp accuracy issue in Gemini 2.0 GA models

Hi,

As previously reported by users, audio timestamp accuracy in Gemini 2.0 models has been unreliable since transitioning from preview to GA.

Through testing, I have found that if the same audio clip (tested with MP3 files) is converted into a video format (tested with MP4 files with a solid background), the timestamps are accurate. This suggests the issue may be specific to how the model processes standalone audio files. As a workaround, this is not ideal since it comes with a 10x input token increase.

Currently, the gemini-2.0-flash-thinking-exp-01-21 model provides accurate audio timestamps, but I am concerned that this functionality might break again when moving to GA.

Is anyone at Google aware of this issue, and are there any plans to address it in future updates?

1 Like