Customize Keras Loss to take Mean of the Regularizations intead of the Sum

I was writing some simple models and I did not want the Model Loss to be the sum of all L2 Regularizations. I wanted it to be the mean instead. My reason being that having 3 L2 Losses had a huge impact of regularization, taking the mean reduces that impact. In most courses as well, we can take the mean

Any idea on how to approach it in a manner that can generalize well

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()

model.add(Dense(100, input_shape=(8,), kernel_regularizer=tf.keras.regularizers.L2(0.01)))

model.add(Dense(80, kernel_regularizer=tf.keras.regularizers.L2(0.01)))

model.add(Dense(30, kernel_regularizer=tf.keras.regularizers.L2(0.01)))

model.add(Dense(1))

model.compile(optimizer=‘adam’, loss=‘binary_crossentropy’, metrics=[‘accuracy’])

print(model.losses)
[<tf.Tensor: shape=(), dtype=float32, numpy=0.15066518>, <tf.Tensor: shape=(), dtype=float32, numpy=0.883246>, <tf.Tensor: shape=(), dtype=float32, numpy=0.4300898>]

I would want the loss to add (0.15066518 + 0.883246 + 0.4300898)/3 instead of (0.15066518 + 0.883246 + 0.4300898)

Do you want just to Reduction.None like in

https://github.com/keras-team/keras/blob/master/keras/losses.py#L546-L547

And then apply your custom operation?

I want the Binary Cross entryopy to be added to loss function

I essentially want my loss function to be

L = BCE + Avg(Regularization)

Current implementation in Keras is L = BCE + Sum(Regularization)

For the regularization loss penalizzation the sum is embedded in the code when you compile the loss:

https://github.com/keras-team/keras/blob/master/keras/engine/compile_utils.py#L231

As a workaround probably you could create a custom regularizer that you can scale yourself (if you know the total number of regularizers) or you can control your loss more in detail with a custom trainning loop:

Thanks, I was hoping there would be a simple way. Thanks for letting me know

Probably the best way would be the training step as I am not sure how many layers require regularization. I want the solution to be generic and not specific

1 Like