Did you try out variations of the model implementation you shared? Did you get improvements?
Isn’t your dataset a little small?
Can I ask why you used the sigmoid
function as activation function (except output layer) instead of softmax
as generally used with LeNet?