Distillting ViTs through attention

Sayak_Paul · April 12, 2022, 12:59am

Hi folks,

I hope you’re doing well.

Researchers have tried to train Vision Transformers (ViT) well in different ways. Training with more regularization for longer and distilling a well-trained CNN – these two recipes have clearly stood out among the simplest of the recipes to get a good ViT model.

In my latest project, I implement the DeiT family of ViT models, port the pre-trained params into the implementation, and provide code for off-the-shelf inference, fine-tuning, and visualizing attention rollout plots, and distilling ViT models through attention. Here are the important links:

Code for all the implementations: GitHub - sayakpaul/deit-tf: Includes PyTorch -> Keras model porting code for DeiT models with fine-tuning and inference notebooks.
Pre-trained DeiT models in TensorFlow / Keras: Sayak | deit | Kaggle
Tutorial: Distilling Vision Transformers

Fun fact: With DeiT, I hit a century in the number of models I’ve contributed to TF-Hub in about two years’ time (101 models):

Thanks to @fchollet for reviewing the tutorial. Thanks to @ariG23498 who implemented some portions of the ViTClassifier class as shown in the tutorial.

Don’t hesitate to reach out if you have any questions. Have a great day!

Topic		Replies	Views
Implementation of CaiT family of models Show and Tell models , research , keras	0	1352	May 4, 2022
Probing into the representations of ViTs 🤖 🕵️‍♀️ Show and Tell models , keras , education	2	1822	May 16, 2022
Implementing ShiftViT Show and Tell keras , education	0	1071	March 2, 2022
Implementing "ViViT: A Video Vision Transformer" Show and Tell keras , education	0	2275	January 18, 2022
Vision Transformers are Robust Learners Show and Tell research , learning , education	1	1172	May 19, 2021

Distillting ViTs through attention

Related topics