Why is my model performing this "good"?

Hi, for a university project I am trying to classify cervical cancer images. The dataset can be found on kaggle (Multi Cancer Dataset | Kaggle) and consists of 25’000 total images, 5’000 images per class.

The best model over 50 epochs provides the following metrics:

Metric Value
Accuracy 0.9916
Loss 0.0817
Val Accuracy 0.9962
Val Loss 0.0674
Learning Rate 1.2500e-04

With classification report:

Class Precision Recall F1-Score Support
Dyskeratotic 1.00 1.00 1.00 1000
Koilocytotic 1.00 0.99 0.99 1000
Metaplastic 1.00 1.00 1.00 1000
Parabasal 1.00 1.00 1.00 1000
Superficial-Intermediate 1.00 1.00 1.00 1000
Accuracy 1.00 5000
Macro Avg 1.00 1.00 1.00 5000
Weighted Avg 1.00 1.00 1.00 5000

The model does not seem to overfit as by the below graphs:

With the model architecture being really simple…

model = Sequential([
  layers.Input(shape=(img_height, img_width, 3)),
  layers.Rescaling(1./255),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),

  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.3),

  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.4),

  layers.Flatten(),
  layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
  layers.Dropout(0.5),
  layers.Dense(num_classes)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

… I do not understand how we could achieve these metrics and suppose something went wrong, but I do not know what. Altough current state of the art classifiers for cervical cancer images reach good accuracy they are mostly trained on a pre-trained model like ResNet.

Thankful for any help regarding the matter.

Hi @smote, This might be due to several reasons like, images within the class are more similar but having high dissimilarity between the classes, high correlation between features, duplicates of data in both test and training data etc. Thank You.

1 Like