I am writing a custom training loop to train my model using Adam optimizer and custom-written Binary Crossentropy loss function instead of using the inbuilt Binary Crossentropy loss function. When I use my custom-written loss function, the loss of the model doesn’t seem to decrease steeply as it does in the case when I use the inbuilt Binary Crossentropy.
The following notebook shows my code and outputs.
model = tf.keras.Sequential([
tf.keras.layers.Rescaling(1./255),
tf.keras.layers.Conv2D(16, (3, 3), activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='sigmoid')
])
class BCE(tf.keras.losses.Loss):
# @tf.function
def call(self, y_true, y_pred):
y_pred = tf.convert_to_tensor(y_pred)
y_true = tf.cast(y_true, y_pred.dtype)
bce = y_true * tf.math.log(y_pred + 1e-07) + (1 - y_true) * tf.math.log(1 - y_pred + 1e-07)
return -K.mean(bce)
optimizer = tf.keras.optimizers.Adam()
loss_fn = BCE()
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
return loss_value
epochs = 10
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
loss_over_epoch = 0
# Iterate over the batches of the dataset.
for x_batch_train, y_batch_train in training_dataset:
loss_over_epoch += train_step(x_batch_train, y_batch_train)
print("Loss over the epoch: ", float(loss_over_epoch / len(training_dataset)))
Start of epoch 0
Loss over the epoch: 0.723311185836792
Start of epoch 1
Loss over the epoch: 0.700799822807312
Start of epoch 2
Loss over the epoch: 0.6997634172439575
Start of epoch 3
Loss over the epoch: 0.6988803148269653
Start of epoch 4
Loss over the epoch: 0.6978280544281006
Start of epoch 5
Loss over the epoch: 0.6973050832748413
Start of epoch 6
Loss over the epoch: 0.696485698223114
Start of epoch 7
Loss over the epoch: 0.6961873769760132
Start of epoch 8
Loss over the epoch: 0.6955927014350891
Start of epoch 9
Loss over the epoch: 0.6953248977661133
But if I use
loss_fn = tf.keras.losses.BinaryCrossentropy()
I get the following statistics
Start of epoch 0
Loss over the epoch: 0.7039625644683838
Start of epoch 1
Loss over the epoch: 0.6281335949897766
Start of epoch 2
Loss over the epoch: 0.5979914665222168
Start of epoch 3
Loss over the epoch: 0.5627148151397705
Start of epoch 4
Loss over the epoch: 0.5050505995750427
Start of epoch 5
Loss over the epoch: 0.4839542806148529
Start of epoch 6
Loss over the epoch: 0.47102412581443787
Start of epoch 7
Loss over the epoch: 0.4398611783981323
Start of epoch 8
Loss over the epoch: 0.41520529985427856
Start of epoch 9
Loss over the epoch: 0.40074270963668823
What is the mistake I am committing in my custom loss function?