So im trying to implement a weighted loss function, i took two different approaches. They both yield the same loss when i take the predictions of my model and compute the loss between predictions and ground truths. However, when i pass the loss functions as a parameter in model.compile and then i use model.fit, they get vastly different results. Any ideas why that could be?
First Approach (just using python functions):
def get_weighted_loss(pos_weights, neg_weights, epsilon=1e-7):
def weighted_loss(y_true, y_pred):
# computes the weighted binary cross entropy loss
loss = -1 * keras.src.backend.numpy.mean(
pos_weights * y_true[:] * keras.src.backend.numpy.log(y_pred[:] + epsilon) +
neg_weights * (1 - y_true[:]) * keras.src.backend.numpy.log(
1 - y_pred[:] + epsilon)) # complete this line
return loss
return weighted_loss
Implementation:
def train_model(self, epochs=10):
freq_pos, freq_neg = dsu.compute_class_frequency(self.y_train)
# Compiles model for training purposes
self.model.compile(optimizer=keras.optimizers.AdamW(),
loss=self.get_weighted_loss(freq_neg, freq_pos),
metrics=['accuracy'])
# Trains or fits the models training data
history = self.model.fit(self.x_train, self.y_train, validation_data=(self.x_valid, self.y_valid),
epochs=epochs, batch_size=32)
return history
Result:
Gets great accuracy at around 3 training epochs, 0.95 - 0.98 ish
Second Approach (using keras loss class inheritance):
class SingleClassWeightedLoss(keras.losses.Loss):
def __init__(self, pos_weight, neg_weight, epsilon=1e-7):
super(SingleClassWeightedLoss, self).__init__()
self.name = 'WeightedLoss'
self.neg_weight = neg_weight
self.pos_weight = pos_weight
self.epsilon = epsilon
def call(self, y_true, y_pred):
# computes the weighted binary cross entropy loss
loss = -1 * keras.src.backend.numpy.mean(
self.pos_weight * y_true[:] * keras.src.backend.numpy.log(y_pred[:] + self.epsilon) +
self.neg_weight * (1 - y_true[:]) * keras.src.backend.numpy.log(
1 - y_pred[:] + self.epsilon))
return loss
Implementation:
def train_model(self, epochs=10):
freq_pos, freq_neg = dsu.compute_class_frequency(self.y_train)
# Compiles model for training purposes
self.model.compile(optimizer=keras.optimizers.AdamW(),
loss=lsu.SingleClassWeightedLoss(freq_neg, freq_pos),
metrics=['accuracy'])
# Trains or fits the models training data
history = self.model.fit(self.x_train, self.y_train, validation_data=(self.x_valid, self.y_valid),
epochs=epochs, batch_size=32)
return history
Result:
terrible accuracy, around 0.40-0.45 ish after 10 epochs
Paradox (i compute the loss using both approaches):
my_preds = my_nn.batch_predict(my_nn.x_train, normalize=True)
my_ground_truth = my_nn.y_train
my_loss_fn = my_nn.get_weighted_loss(my_neg_freq, my_pos_freq)
loss = my_loss_fn(my_ground_truth, my_preds)
print(loss)
class_loss_fn = lsu.SingleClassWeightedLoss(my_neg_freq, my_pos_freq)
loss = class_loss_fn(my_ground_truth, my_preds)
print(loss)
both loss functions yield the same result:
why is the first approach giving me good results and the second approach giving me terrible results?