Implementing "Augmenting convolutional networks with attention-based aggregation"

ariG23498 · January 28, 2022, 4:26am

In my latest keras example I minimally implement “Augmenting Convolutional networks with attention-based aggregation” by Touvron et. al.

The main idea is to use a non-pyramidal convnet architecture and to swap the pooling layer with a transformer block. The transformer block acts like a cross-attention layer that helps in attending to feature maps that are useful for a classification decision.

The attention-maps from the transformer block helps in the interpretability of the model. It let’s us know which part (patch) of the image is the model really focused on when making a classificaiton decision.

Link to the tutorial: Augmenting convnets with aggregated attention

@Ritwik_Raha, @Devjyoti_Chakraborty and I have built a Hugging Face demo around this example for all of you to try. In the demo we use a model that was trained on the imagenette dataset.

Link to the demo: Augmenting CNNs with attention-based aggregation - a Hugging Face Space by keras-io

I would like to thank JarvisLabs.ai for providing me with GPU credits for this project.

innat · January 28, 2022, 12:53pm

Just tried, amazing.

ariG23498 · January 28, 2022, 12:55pm

Glad you like it! All credits to the authors of the paper for their wonderful research

Topic		Replies	Views
Implementing Compositional Attention Show and Tell keras	1	898	June 27, 2022
I implemented Transformer in Transformer in TensorFlow Show and Tell keras , education	2	1078	December 4, 2021
Implementing "ViViT: A Video Vision Transformer" Show and Tell keras , education	0	2282	January 18, 2022
Transformers tutotial Show and Tell models , keras , education	0	739	September 5, 2022
Implementing Fastformer: Additive Attention Can Be All You Need Show and Tell keras , education	2	1380	September 6, 2021

Implementing "Augmenting convolutional networks with attention-based aggregation"

Related topics