I’m wondering why all Keras layers use Glorot initialization as default. Since Relu is the most popular activation function, shouldn’t He be the default initialization?
The prebuilt application models such as ResNet50, also use Glorot initialization as default and there is no parameter to pass and modify it.
Exactly! In my case I’m using the default ResNet50, trained from scratch and the network is training and converging. My inputs have an arbitrary number of channels that’s why I cannot use ImageNet weights. However, I’m wondering if initialization with He method would improve the results. I noticed a big difference in overfitting rom run to run depending on the initials weights from each run.
Interesting, I wonder how they trained the VGG19 in keras.applications
Here it is in mid 2016:
It’s probably one of those things that got set at one point when it made sense and then got locked in by backwards compatibility guarantees.
Aside from updating the keras.applications to allow initializers as arguments. Annother possible solution would be for keras to implements a global “default_initializer” or something like that. Either one would take some work.