BinaryCrossentropy works, but CategoricalCrossentropy does not, maybe a bug?

Thariq_Nugrohotomo · July 19, 2022, 10:25am

I’m experimenting with image classification using my own generated toy dataset with only 2 classes.

When using BinaryCrossentropy, the training-accuracy can reach 100%.
But upon switching to CategoricalCrossentropy using the same base model, the training-accuracy stuck at around 53%.
Seems like CategoricalCrossentropy is still performing badly even when I introduce the 3rd class to the dataset.

This experiement can be found in this Kaggle notebook → https://www.kaggle.com/code/thariqnugrohotomo/bug-within-tf-keras-categoricalcrossentropy/notebook

If BinaryCrossentropy can reach 100% training-accuracy, why CategoricalCrossentropy seems stuck at around 53%?

Is something wrong with my code? Or is it a bug within the library?

lgusm · July 26, 2022, 7:07pm

hey @markdaoust , can you help here?

Mark_Daoust · July 27, 2022, 6:30pm

Wow! Those should be basically mathematically identical. What the heck is going on?

It would take some debugging to work out what’s going wrong here.

Thariq_Nugrohotomo · July 28, 2022, 8:12am

I have tried looking at the resulting kernel from Conv2D layer:

With BinaryCrossentropy loss, the CNN kernel will detect the top and bottom edges. This feature is useful measure for counting the number of vertical-lines (i.e. greater number of top and bottom edges always imply greater number of vertical-lines).
With CategoricalCrossentropy loss, the CNN kernel will detect the vertical line. This feature is not helpful for counting the number of vertical-lines itself, because the length of the line may vary a lot.
E.g. Two short vertical-lines will have smaller value compared to a single very-long line.
Strangely, the model is “stuck” with this kernel and can’t find a better one.

Is there any idea about what I should investigate next?

Thanks.

chenmoney · August 8, 2022, 11:07pm

Hi, the reason for the discrepancy is you are comparing two different model structures. When you use binary_crossentropy, you have Dense(1), while it is Dense(2) when you are using sparse_categorical_crossentropy. If you want to do the right comparison, you can use the code below as a reference:

model = build_model_base()
inputs = model.inputs
t = model(inputs)
t = keras.layers.Dense(1, activation='sigmoid')(t)
outputs = tf.concat([1-t, t], axis=-1)
new_model = tf.keras.Model(inputs=inputs, outputs=outputs)

optimizer = tf.keras.optimizers.experimental.Adam(learning_rate=0.001)
new_model.compile(
    optimizer,
    'sparse_categorical_crossentropy', 
    'sparse_categorical_accuracy',
)

new_model.fit(x, y, epochs=20)
plot_training(model)

This works the same as using binary_crossentropy.

Thariq_Nugrohotomo · August 9, 2022, 8:11am

Hi @chenmoney,
Thanks a lot for having a look at my issue. That’s very interesting findings.

But the reason why I want switch from binary-crossentropy to categorical-crossentropy is simply because I want to add more classes to my dataset (3rd classes, 4th classes, etc).
I’m not sure how I can adapt your code to make the model classify more than 2 classes. Would you mind to explain?

Thanks again!

Zhenyu_Tan · August 16, 2022, 9:08pm

I think Dense(num_classes) seem correct to me. This sound like a bug. Thariq would you mind trying custom training loop and see if that helps?

Topic		Replies	Views
Almost no training since using from_tensor_slices General Discussion help_request	2	289	October 3, 2023
Multiclass semantic segmentation model does not learn General Discussion models , learning , help_request	3	2990	October 29, 2021
The accuracy do not increase when I train model General Discussion models , keras , help_request	2	988	February 20, 2023
Getting NaNs while training General Discussion tfkeras , tfdata , model-layers	4	313	February 12, 2024
My first image classification model has an error General Discussion models , datasets	5	1320	January 31, 2023

BinaryCrossentropy works, but CategoricalCrossentropy does not, maybe a bug?

Related topics