While I’m unable to answer all your questions directly, please refer to the Video understanding notebook. For instance, you’ll realize that video understanding works best with the Gemini 2.0 Flash model, and you can also test other models to compare their performance. Please try it out and let us know if you still have questions.
Higher resolution provides more detail but also requires more resources. You’ll need to balance the resolution based on your task and computational limits.
Frame rate matters because more frames provide more temporal detail, but higher rates can be computationally expensive. Lower frame rates (e.g., 1 frame per second) work well when high temporal resolution isn’t necessary.
Videos are typically encoded by sampling frames at set intervals, like 1 frame per second, skipping intermediate frames to reduce data load without losing crucial information.
Versions like Gemini 1.5 Pro are optimized for video understanding and offer solid performance across different modalities, including videos.
By adjusting these factors—resolution, frame rate, encoding, and choosing the right version—you can optimize the performance of your Gemini model for video understanding.