File API, upload video, how to increase FPS?

When uploading a video, In the Cookbook, it says “NOTE: The File API samples the video at 1 frame per second (FPS). This sampling rate may be subject to change to provide the best inference quality.”

How to increase this rate, e.g., 5 frames per second?

The Video cookbook is brand new, and shows the newer, more convenient API. Up to last week, you had to split the video into individual frames, separate out the audio track, and then feed the model all the parts (in sequence) when using AI Studio. People tended to use ffmpeg to make frame collections out of a video clip. That allows you to specify how many frames you want per second.

So, if nobody comes up with a better idea, using the ‘old’ method to handle video will achieve what you want.

The ‘old’ Video cookbook is visible through the GitHub repository

Thank you! I’m okay using the “old” method. Can I also use the “old” model? Do you know if we can access the version before the major update this week? The “old” model gave me much better performance in my task.

Now FPS feature is officially supported.

but I read the documents here, there is no way to set custom fps for File API, it’s only for inline video, which means we have to pass the whole video in payload every single time we query for an answer, not as fast as File API when the video gets uploaded and processed once

For file based FPS, our preprocessing is not covered (Which means we only generate preprocessed file when FPS = 1). And if you specify high/low FPS, it will be preprocessed on the fly anyways (Expect high latency here :frowning: ). However, I’m writing preprocess cache for video, so you should be seeing it supported soon (Probably in 2 weeks).

Also I’m curious to see your use case? Are you using it in for realtime chatting? Why latency matters here. I could talk to PM to prioritize here.

hey, thanks for the blazingly fast response. I just want better results when processing video, the more the model can “see”, the better the response, that’s what I want. Looks like you’re working on it and the video preprocessing is still a work in progress, so better this time I am not thinking about using it first, when you finish your sprint (or whatever it’s called that lasts for 2 weeks), then I will go and read the up-to-date document again to find all the catches and tradeoff we have to aware when using this “preprocessing video”. Again, thanks for your fast response, look forward to your work :saluting_face:

but I really curious about the implementation that made fps=1 really different from the rest, I thought fps should be treated as a variable, and when Files API receives a file, it also needs to take in the fps (if configured) to start the processing phase. Looks like Files API didn’t think about this at the beginning, or may be there’s really something about technical details that really made fps=1 an easier and default option for Files API video processing :thinking:

ok, my use case is to build a chat, not only to understand text but also to understand visual and audible context. Good thing is that users only use text to talk to the model, not voice (as the nature of people I am aiming to, office workers and students), users can wait, but not too long, so that’s why I am asking about the concern of fps!=1 video processing speed, if it’s much much longer than I think better stay at 1fps and instead ask users to be aware of this, other workaround: slow down video before sending, limit the video size to avoid long processing time, waiting for your side to deliver new updates for Files API (I think definitely it will come, some day, 1fps is just too little to get all the details from a visual context, need to be near 24fps to mimic a real human)

File API has been a bit legacy code. It works like a explicit cache. So we recently launched implicit cache for text images. But for other modality like document and video, most of time takes in frame extraction. As a follow up, the implicit cache for doc and video preprocessing will be added soon. So fps will be a parameter passed to hit the implicit cache (Well invisible to you). Will that make your flow much easier?

yes, as long as video can be processed at a configurable fps, the video is uploaded once and can be reused at a much faster speed for later API calls, then it’s fine. Look forward to it :saluting_face: