Implementing Compositional Attention

Here is my TF/Keras implementation of the recent Compositional Attention paper by MILA which disentangles the search and retrieval components of the attention mechanism. This can be used as a drop-in replacement for standard multi-head attention and outperforms it for some tasks.

1 Like

Nice work! Congrats!

1 Like