VideoMAE in Keras

innat · October 12, 2023, 2:15pm

We have implemented VideoMAE in #keras and ported the official #pytorch weights. Video masked autoencoder (VideoMAE) is a data-efficient learners for self-supervised video pre-training task, makes video reconstruction a more challenging self-supervision and encourages extracting more effective video representations.

Total 12 checkpoints are available in both #SavedModel and #h5 formats for top benchmark datasets, i.e. Kinetics-400, Something-Something-v2, and UCF101.

Inference

With the encoder model of VideoMAE, we can take inference on a video. For example, show below, a sample from Kinetics-400 test set.

from videomae import VideoMAE_ViTS16FT

>>> model = VideoMAE_ViTS16FT(
    img_size=224, patch_size=16, num_classes=400
 )
>>> container = read_video('sample.mp4')
>>> frames = frame_sampling(container, num_frames=16)
>>> y = model(frames)
>>> y.shape
TensorShape([1, 400])

>>> probabilities = tf.nn.softmax(y_pred_tf)
>>> probabilities = probabilities.numpy().squeeze(0)
>>> confidences = {
    label_map_inv[i]: float(probabilities[i]) \
    for i in np.argsort(probabilities)[::-1]
}
>>> confidences

{
    'playing_cello': 0.6552159786224365,
    'snowkiting': 0.0018940207082778215,
    'deadlifting': 0.0018381892004981637,
    'playing_guitar': 0.001778001431375742,
    'playing_recorder': 0.0017528659664094448,
}

Visualization

Some reconstructed video sample using VideoMAE maksed autoencoder pretrained models with different mask ratio.

sample10

sample2

sample9

sample6

sample5

innat · October 21, 2023, 9:35am

The gradio app for VideoMAE in keras is available in hf space.

Topic		Replies	Views
Video Swin Transformer in Keras Show and Tell keras	1	752	October 21, 2023
Masked Autoencoders are now available in 🤗 transformers in TensorFlow! TensorFlow models , keras	0	1791	March 30, 2022
UniFormerV2 in Keras Show and Tell keras	2	514	October 21, 2023
Video-FocalNet in Keras Show and Tell github , keras	0	474	November 2, 2023
Masked Autoencoders for self-supervised pretraining of images Show and Tell keras , learning , education	1	1288	December 2, 2021

VideoMAE in Keras

Inference

Visualization

Related topics