Video processing - Best approaches towards analyzing large videos?

Hello fellow developers,

As of today, we can use high FPS for videos < 20 mb.. For better video processing larger videos, it is recommended to truncate them into sequence of shorter videos (< 20 MB) and use higher FPS to get better analysis.. This is the high level approach..

I’m looking to gather practical insights on how you’re handling your use-cases. I’m particularly interested in:

  • Strategies for splitting videos (fixed time vs. scene detection).

  • Techniques for preserving context across chunks (e.g., overlapping, prompt chaining).

  • Tools you’re using for the pre-processing pipeline.

  • Performance of gemini models towards your use-case.

Could you share your use case, the main challenges you faced (like losing temporal context), and the approach that ultimately worked for you?

Your valuable insights will directly help us understand user needs and guide the development of future product enhancements.

Hello,

I’ve spent all of the last few days trying to solve this problem. There are a bunch of solutions. The data I work with right now is long form sports streams. To get good understanding I want a higher fps of course. Locally, I use ffmpeg to downsample to a lower resolution and keep only the n frames per second that I wanted gemini to use. This means I am able to fit much more of a video into the 20mb without any loss.

For deployment, it would be much easier if I could just use the files api and specify an fps there. I wish this were possible, please make it so. In the web app environment I am trying to develop, ffmpeg-wasm is really slow, and I might be dealing with really high quality data (a 4k stream means < 30sec for ~20mb). I don’t want to use another api to downsize or get to the right fps.

I use overlapping. I think this is a pretty arbitrary choice for me, just have the overlap at least as long as an average play or event (~20s). I also used one normal 1fps pass over the full video and asked for some global context about player numbers, camera positions, the overall flow of a game, and anything else that might be useful to the individual chunks.

Not being able to specify higher fps with the files api is really a hindrance, and I don’t have a good solution at this point. But otherwise, Gemini works great and is has really good understanding of what is happening moment to moment. Please add this functionality soon! Thanks!