I implemented Transformer in Transformer in TensorFlow

rishit_dagli · December 4, 2021, 9:03am

I implemented the very recent NeurIPS 21 paper Transformer in Transformer in TensorFlow which uses attention inside local patches essentially using pixel level attention paired with patch level attention. This also achieves SoTA performance on image classification beating ViT and DeiT with similar computational cost.

innat · December 4, 2021, 9:21am

rishit_dagli · December 4, 2021, 9:42am

This is a nice implementation as well, thanks for sharing

Topic		Replies	Views
Implementing Fastformer: Additive Attention Can Be All You Need Show and Tell keras , education	2	1378	September 6, 2021
Train a Vision Transformer on small datasets Show and Tell keras , education	1	2520	January 14, 2022
TF/Keras implementation of Conformer: Convolution-augmented Transformer Show and Tell models , keras , education	0	2803	January 4, 2022
Implementation of CaiT family of models Show and Tell models , research , keras	0	1352	May 4, 2022
Implementing Compositional Attention Show and Tell keras	1	898	June 27, 2022

I implemented Transformer in Transformer in TensorFlow

Related topics