I’m experimenting with image classification using my own generated toy dataset with only 2 classes.

When using BinaryCrossentropy, the training-accuracy can reach 100%.
But upon switching to CategoricalCrossentropy using the same base model, the training-accuracy stuck at around 53%.
Seems like CategoricalCrossentropy is still performing badly even when I introduce the 3rd class to the dataset.

I have tried looking at the resulting kernel from Conv2D layer:

With BinaryCrossentropy loss, the CNN kernel will detect the top and bottom edges. This feature is useful measure for counting the number of vertical-lines (i.e. greater number of top and bottom edges always imply greater number of vertical-lines).

With CategoricalCrossentropy loss, the CNN kernel will detect the vertical line. This feature is not helpful for counting the number of vertical-lines itself, because the length of the line may vary a lot.
E.g. Two short vertical-lines will have smaller value compared to a single very-long line.
Strangely, the model is “stuck” with this kernel and can’t find a better one.

Is there any idea about what I should investigate next?

Hi, the reason for the discrepancy is you are comparing two different model structures. When you use binary_crossentropy, you have Dense(1), while it is Dense(2) when you are using sparse_categorical_crossentropy. If you want to do the right comparison, you can use the code below as a reference:

model = build_model_base()
inputs = model.inputs
t = model(inputs)
t = keras.layers.Dense(1, activation='sigmoid')(t)
outputs = tf.concat([1-t, t], axis=-1)
new_model = tf.keras.Model(inputs=inputs, outputs=outputs)
optimizer = tf.keras.optimizers.experimental.Adam(learning_rate=0.001)
new_model.compile(
optimizer,
'sparse_categorical_crossentropy',
'sparse_categorical_accuracy',
)
new_model.fit(x, y, epochs=20)
plot_training(model)

Hi @chenmoney,
Thanks a lot for having a look at my issue. That’s very interesting findings.

But the reason why I want switch from binary-crossentropy to categorical-crossentropy is simply because I want to add more classes to my dataset (3rd classes, 4th classes, etc).
I’m not sure how I can adapt your code to make the model classify more than 2 classes. Would you mind to explain?