Hello,
I was trying to figure out what the label_smoothing parameter did for the loss “Categorical Crossentropy” and looking at the code, I came across this (keras/keras/losses/losses.py at v3.1.1 · keras-team/keras · GitHub):
if label_smoothing:
num_classes = ops.cast(ops.shape(y_true)[-1], y_pred.dtype)
y_true = y_true * (1.0 - label_smoothing) + (
label_smoothing / num_classes
)
The calculation of num_classes assumes that the classes are located on the -1 axis, but the categorical_crossentropy function takes “axis” as a parameter in order to know which axis corresponds to the classes.
I don’t understand why we don’t just use :
num_classes = ops.cast(ops.shape(y_true)[axis], y_pred.dtype)
Is there something I’ve misunderstood that explains this, or is it an error?