Optimal Video Pre-processing Parameters (FPS, Resolution) for File API

Firstly Welcome to the Google AI for Developers Forum! :confetti_ball: :confetti_ball:

Thank you for your thoughtful questions regarding video pre-processing for the Gemini File API. Let’s address your concerns:

  1. Frame Rate (FPS) Transcoding:
    The Gemini File API samples videos at 1 frame per second (FPS), as detailed in the Video understanding documentation. Pre-processing your videos to 1 FPS before uploading is a recommended practice. This approach reduces file size, leading to faster uploads and lower storage overhead, without any loss of information that the model would use. There’s no hidden benefit to uploading videos with higher frame rates (e.g., 30/60 FPS). In fact, transcoding to 1 FPS aligns with the model’s processing capabilities and optimizes performance.

  2. Video Resolution:
    The Gemini File API processes videos at a default media resolution. While higher resolutions (e.g., 1080p or 4K) may offer more detail, they do not necessarily improve the model’s performance on tasks like object detection or text recognition. In many cases, downscaling to a standard resolution (e.g., 720p) can be more efficient, as the model may internally downscale frames to a standard resolution before processing. Therefore, uploading videos at a standard resolution can help optimize performance and reduce processing time.

For more detailed information, please refer to the Video understanding documentation

If you have any further questions or need assistance with video pre-processing, feel free to ask!