Here’s my implementation of MLP-Mixer, the all MLP architecture for computer vision without any use of convs and self-attention:
Here’s what is included:
- Distributed training with mixed-precision.
- Visualization of the token-mixing MLP weights.
- A TensorBoard callback to keep track of the learned linear projections of the image patches.
Results are quite competitive with room for improvements for interpretability.