Gemini 2.0 Flash Video Undestanding Issues

I’m experimenting with the flash 2.0 model for long video understanding, and i’m finding that its ability to provide timecodes on longer videos (more then a few minutes) is completely made up. It seems like its providing timecodes fine for the first few minutes, and then it starts hallucinating the rest of the timecodes.

An example prompt: given this video, list all scenes and their timecode (format MM:SS).

I’ve tried both raw chat, function calling and structured outputs, but none of them have any significant difference.
I’ve also experimented with supplying some timecodes of scenes i know of, hoping it would steer the model towards the right timecodes.

Anyone found tricks for how to reduce timecode hallucinations?

Is it possible that it doesn’t understand the video at all?
I am also trying to make Gemini 2.0 models to understand a one hour long youtube video but it doesn’t even give me a response.
Maybe in your case, it doesn’t start working on your video and just hallucinate based on other text information about the video?

Hi @Jonas_Jongejan @Jonas_Jongejan Apologies for late response .
It’s been a while if you are still facing this issue could you please try to use our latest 2.5 Flash and 2.5 Flash-Lite models and let us know if the issue still persist? Thank you!