Looking into source code, I see that
def __call__(self, x): return self.l2 * tf.reduce_sum(tf.square(x))
So, tensor x contains non-zero values, loss is never zero, but back propagation is tend to reduce it. It means that it froces tensor x to become zero, which is obviously not a purpose.
What do I misunderstand?
That’s the purpose of a regularizer: to encourage coefficients to be closer to zero, to the extent that it doesn’t harm the loss minimization significantly.
For L1 regularization, variables that don’t have significant predictive power get set to zero, because the decrease in the regularization penalty is bigger than the loss in predictive quality.
For L2 regularization, it encourages variables with collinearity to get similar coefficients (e.g. 4.8 and 5.2, sum of squares = 50.08), rather than one large and one small (-2.0 and 12.0, sum of squares = 148.2).
The size of the regularization coefficient determines the importance of loss minimization relative to having small or zero valued coefficients.
@Robert_Pope,
Thank you very much for your response. As I could understand, a regularization loss “competes” vs cost function loss. In that case, if I put two much weight to regularization loss it might force weights to go down to zero, despite the cost function, Is that correct?