Optimal Video Pre-processing Parameters (FPS, Resolution) for File API

I have a question regarding the best practices for video processing before uploading files via the File API for video understanding. My goal is to optimize for performance, cost, and efficiency.
The official documentation (Video understanding  |  Gemini API  |  Google AI for Developers)states that video is processed at a 1 frame per second (fps) sample rate. This leads to a couple of key questions about pre-processing:

  1. Frame Rate (FPS) Transcoding:
    Given that the model samples video at 1fps, is it a recommended best practice to pre-process our videos and transcode them down to 1fps before uploading?
    It seems this would significantly reduce the file size, leading to faster uploads and lower storage overhead, without any loss of information that the model would use. Is this assumption correct? Or is there any hidden benefit to uploading a video with a higher frame rate (e.g., 30/60 fps)?
  2. Video Resolution:
    I could not find any explicit guidance in the documentation regarding optimal or required video resolutions.
    • Is there a recommended resolution (e.g., 480p, 720p, 1080p) for video uploads?
    • Does providing a higher resolution (e.g., 1080p or 4K) improve the model’s performance on tasks like object detection or text recognition within the video?
    • Or are frames downscaled to a standard internal resolution before processing, making it more efficient to simply upload a standard-definition video?

Firstly Welcome to the Google AI for Developers Forum! :confetti_ball: :confetti_ball:

Thank you for your thoughtful questions regarding video pre-processing for the Gemini File API. Let’s address your concerns:

  1. Frame Rate (FPS) Transcoding:
    The Gemini File API samples videos at 1 frame per second (FPS), as detailed in the Video understanding documentation. Pre-processing your videos to 1 FPS before uploading is a recommended practice. This approach reduces file size, leading to faster uploads and lower storage overhead, without any loss of information that the model would use. There’s no hidden benefit to uploading videos with higher frame rates (e.g., 30/60 FPS). In fact, transcoding to 1 FPS aligns with the model’s processing capabilities and optimizes performance.

  2. Video Resolution:
    The Gemini File API processes videos at a default media resolution. While higher resolutions (e.g., 1080p or 4K) may offer more detail, they do not necessarily improve the model’s performance on tasks like object detection or text recognition. In many cases, downscaling to a standard resolution (e.g., 720p) can be more efficient, as the model may internally downscale frames to a standard resolution before processing. Therefore, uploading videos at a standard resolution can help optimize performance and reduce processing time.

For more detailed information, please refer to the Video understanding documentation

If you have any further questions or need assistance with video pre-processing, feel free to ask!