I contributed this collection containing 6 different ConvMixer models that were pre-trained on the ImageNet-1K dataset available for fine-tuning as well as image classification. Further, the models are also accompanied with a tutorial to help you get started in <5 minutes.
ConvMixer is a simple model that uses only standard convolutions to achieve the mixing steps. Despite its simplicity, ConvMixer outperforms ViT and MLP-Mixer. ConvMixer relies directly on patches as input, separates the mixing of spatial and channel dimensions and maintains equal size and resolution throughout the network.
The associated GitHub repo could be found here:
You might want to take a look at a ConvMixer implementation by @Sayak_Paul here and by @anon26514083 here.