Categorical Crossentropy Label Smoothing

Hello,

I was trying to figure out what the label_smoothing parameter did for the loss “Categorical Crossentropy” and looking at the code, I came across this (keras/keras/losses/losses.py at v3.1.1 · keras-team/keras · GitHub):

if label_smoothing:
        num_classes = ops.cast(ops.shape(y_true)[-1], y_pred.dtype)
        y_true = y_true * (1.0 - label_smoothing) + (
            label_smoothing / num_classes
        )

The calculation of num_classes assumes that the classes are located on the -1 axis, but the categorical_crossentropy function takes “axis” as a parameter in order to know which axis corresponds to the classes.
I don’t understand why we don’t just use :
num_classes = ops.cast(ops.shape(y_true)[axis], y_pred.dtype)

Is there something I’ve misunderstood that explains this, or is it an error?

Hi @coco,

Sorry for the delay in response.

num_classes = ops.cast(ops.shape(y_true)[-1], y_pred.dtype)

This axis=-1 is the last dimension of y_true corresponds to the classes which is common in one-hot encoding and axis parameter in categorical crossentropy defines the axis for loss calculation, not the number of classes. As far as I’m aware, for label smoothing the number of classes should be determined from the dimension that corresponds to the classes of last axis, while using axis to calculate num_classes could be incorrect if axis doesn’t match the classes dimension that is why [-1] is used.

Hope this helps.Thank You.