Here is my TensorFlow/ Keras implementation of the Nystromformer models, which uses the Nyström method to approximate standard self-attention allowing for better scalability in Transformers.
2 Likes
Very interesting!
Thanks for sharing!
1 Like