BinaryCrossentropy works, but CategoricalCrossentropy does not, maybe a bug?

I’m experimenting with image classification using my own generated toy dataset with only 2 classes.

When using BinaryCrossentropy, the training-accuracy can reach 100%.
But upon switching to CategoricalCrossentropy using the same base model, the training-accuracy stuck at around 53%.
Seems like CategoricalCrossentropy is still performing badly even when I introduce the 3rd class to the dataset.

This experiement can be found in this Kaggle notebook → https://www.kaggle.com/code/thariqnugrohotomo/bug-within-tf-keras-categoricalcrossentropy/notebook

If BinaryCrossentropy can reach 100% training-accuracy, why CategoricalCrossentropy seems stuck at around 53%?

Is something wrong with my code? Or is it a bug within the library?

1 Like

hey @markdaoust , can you help here?

1 Like

Wow! Those should be basically mathematically identical. What the heck is going on?

It would take some debugging to work out what’s going wrong here.

I have tried looking at the resulting kernel from Conv2D layer:

  • With BinaryCrossentropy loss, the CNN kernel will detect the top and bottom edges. This feature is useful measure for counting the number of vertical-lines (i.e. greater number of top and bottom edges always imply greater number of vertical-lines).
  • With CategoricalCrossentropy loss, the CNN kernel will detect the vertical line. This feature is not helpful for counting the number of vertical-lines itself, because the length of the line may vary a lot.
    E.g. Two short vertical-lines will have smaller value compared to a single very-long line.
    Strangely, the model is “stuck” with this kernel and can’t find a better one.

Is there any idea about what I should investigate next?

Thanks.

Hi, the reason for the discrepancy is you are comparing two different model structures. When you use binary_crossentropy, you have Dense(1), while it is Dense(2) when you are using sparse_categorical_crossentropy. If you want to do the right comparison, you can use the code below as a reference:

model = build_model_base()
inputs = model.inputs
t = model(inputs)
t = keras.layers.Dense(1, activation='sigmoid')(t)
outputs = tf.concat([1-t, t], axis=-1)
new_model = tf.keras.Model(inputs=inputs, outputs=outputs)

optimizer = tf.keras.optimizers.experimental.Adam(learning_rate=0.001)
new_model.compile(
    optimizer,
    'sparse_categorical_crossentropy', 
    'sparse_categorical_accuracy',
)

new_model.fit(x, y, epochs=20)
plot_training(model)

This works the same as using binary_crossentropy.

Hi @chenmoney,
Thanks a lot for having a look at my issue. That’s very interesting findings.

But the reason why I want switch from binary-crossentropy to categorical-crossentropy is simply because I want to add more classes to my dataset (3rd classes, 4th classes, etc).
I’m not sure how I can adapt your code to make the model classify more than 2 classes. Would you mind to explain?

Thanks again!

I think Dense(num_classes) seem correct to me. This sound like a bug. Thariq would you mind trying custom training loop and see if that helps?

1 Like