Implementing Fastformer: Additive Attention Can Be All You Need

rishit_dagli · September 3, 2021, 2:25pm

I am glad to present my implementation of the “Fastformer: Additive Attention Can Be All You Need” paper.

This is a Transformer variant based on additive attention that can handle long sequences efficiently with linear complexity. Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.

lgusm · September 4, 2021, 4:34pm

Nice work!

If you have the trained model, maybe you could also publish it on Tensorflow Hub

rishit_dagli · September 6, 2021, 4:08pm

Thanks, @lgusm
I haven’t yet worked on training it but this is a really great idea, let me get started on this as soon as possible.

Topic		Replies	Views
TF/Keras implementation of Conformer: Convolution-augmented Transformer Show and Tell models , keras , education	0	2815	January 4, 2022
I implemented Nystromformer Show and Tell models , keras , education	1	1125	August 22, 2022
I implemented Transformer in Transformer in TensorFlow Show and Tell keras , education	2	1077	December 4, 2021
Implementing DeepMind's new Perceiver Model Show and Tell models , keras	2	2274	December 14, 2021
Implementing Compositional Attention Show and Tell keras	1	898	June 27, 2022

Implementing Fastformer: Additive Attention Can Be All You Need

Related topics