Here is my TF/Keras implementation of the recent Compositional Attention paper by MILA which disentangles the search and retrieval components of the attention mechanism. This can be used as a drop-in replacement for standard multi-head attention and outperforms it for some tasks.
1 Like
Nice work! Congrats!
1 Like