How do we process videos to feed to a Deep Learning model and train it? Can we borrow concepts from image and text models and combine those to train a video classification model? Yes, we can.
My latest example on keras.io shows you how:
How do we process videos to feed to a Deep Learning model and train it? Can we borrow concepts from image and text models and combine those to train a video classification model? Yes, we can.
My latest example on keras.io shows you how:
Nice work Sayak!! Added to my to-read list!!
A Transformer variant coming soon. Stay tuned, Gus
Recently we have added MoViNets for Action Recognition on Mobile:
https://github.com/tensorflow/models/tree/master/official/vision/beta/projects/movinet
Here’s the one: Video Classification with Transformers