SwitchNet is a fast transform (not transformer) based neural network that uses the Fast Walsh Hadamard transform (WHT.)
Such fast transforms have an equivelent matrix form that you could view as a fixed neural network weight matrix.
Of course something must be made adjustable but you can do that!
The gain is full connectivity at a cost of nlog2(n) operations instead of the normal n squared operations for an actual matrix.
I see tensorflow has the WHT at least in some form now:
tff.aggregators.HadamardTransformFactory
It probably has CReLU and the other things you need:
Here is some javascript code:
https://editor.p5js.org/siobhan.491/sketches/RvqZfikaE
And a blog post about SwitchNet:
https://ai462qqq.blogspot.com/2023/04/switch-net.html
I guess I can look for free GPU/TensorFlow access, now that the prerequisites seem to be there. I didn’t want to have to start with TensorFlow and then find I had to spend 6 months of my life writing an optimized CUDA kernel.