Why does my validation loss increase, but validation accuracy perfectly matches training accuracy?

I am building a simple 1D convolutional neural network in Keras. Here is the model:

def build_model():

    model = models.Sequential()
    model.add(layers.SeparableConv1D(64, kernel_size=2, activation="relu", input_shape=(64,20)))
    model.add(layers.SeparableConv1D(64, kernel_size=2, activation="relu"))
    model.add(layers.MaxPooling1D(4))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation="relu"))
    model.add(layers.Dense(128, activation="relu"))
    model.add(layers.Dropout(0.1))
    model.add(layers.Dense(1, activation="sigmoid"))

    model.compile(
        optimizer='rmsprop',
        loss='binary_crossentropy',
        metrics=[
            keras.metrics.BinaryAccuracy(),
        ],
    )
    
    #model.summary()
    
    return model

When I train my model on roughly 1500 samples, I always get my training and validation accuracy completely overlapping and virtually equal, reflected in the graph below. This is making me think there is something fishy going on with my code or in Keras/Tensorflow since the loss is increasing dramatically and you would expect the accuracy to be affected at least somewhat by this. It looks like it is massively overfitting and yet only reporting the accuracy values for the training set or something along those lines. When I then test on a test set, the accuracy is nowhere near the 85 to 90 percent reported on the graph, but rather ~70%.

Any help is greatly appreciated, I have been stuck on this for the longest time. Below is the training code.

#Define the number of folds... this will give us an 80/20 split
k = 5
epochs = 100
num_val_samples = len(x_train) // k
scores_binacc = []
scores_precision = []
scores_recall = []
histories = []

#Train the dense model in k iterations
for i in range(k):
    print('Processing fold #', i)
    val_data = x_train[i * num_val_samples : (i + 1) * num_val_samples]
    val_targets = y_train[i * num_val_samples : (i + 1) * num_val_samples]
    
    print('Validation partition =  ', i * num_val_samples, (i + 1) * num_val_samples)
    print('Training partition 1 = ', 0, i * num_val_samples)
    print('Training partition 2 = ', (i+1) * num_val_samples, len(x_train))
    
    partial_train_data = np.concatenate(
        [
            x_train[:i * num_val_samples],
            x_train[(i+1) * num_val_samples:]
        ], 
        axis=0
    )
    
    partial_train_targets = np.concatenate(
        [
            y_train[:i * num_val_samples],
            y_train[(i+1) * num_val_samples:]
        ],
        axis=0
    )
    
    model = build_model()
    h = model.fit(
        partial_train_data, 
        partial_train_targets, 
        validation_data=(val_data, val_targets),
        epochs=epochs, 
        verbose=1
    )
    
    val_loss, val_binacc = model.evaluate(val_data, val_targets, verbose=0)
    scores_binacc.append(val_binacc)
    #scores_precision.append(val_precision)
    #scores_recall.append(val_recall)
    histories.append(h)

Maybe you’re overfitting but the underlying relationships are simple so your validation set still has decent accuracy but higher loss.

I feel like the change in accuracy could be caused by shuffling. Are you shuffling your data during training but not on test data? Does order matter for your problem?

Your dataset is very small, causing your model overfitting. You should try the following options:

  1. Augment your data.
  2. Reduce learning rate or “scheduling lr”.
  3. Study this paper: A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay [1803.09820] A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay.
  4. Take a look at model prediction and compare the groundtruth.
  5. Change metrics to F1 score.

Hope above answers help!
Supachan

This basically a bump because I’m having the same exact “issue”. In my case, I’m training a Siamese Network for Face Recognition. There’s a lot going on, I’m implementing a custom loss function (Contrastive Loss) and a custom distance layer for computing distance between face embeddings.

Anyway, I’m seeing during training that my training and validation accuracy are almost exactly the same (the training accuracy has a minimal advantage of maybe 1%, but otherwise the curves are almost the same). However, train accuracy steadily decrements (as it should) but validation loss increases, just as you showed. Accuracy in training & validation reach an impressive 92% but my test accuracy is just about 70%. Nowhere near the promised figures.

I’m very puzzled. So, did you find the cause of the problem? I’ve extensively gone through my code to find the culprit, because this isn’t normal and I’m almost positive that I messed something up. I use a generator (that I wrote) because my dataset has 30,000 images and I can’t load them directly in RAM. I’m suspecting this might be associated with the problem. Just wanted to know if you found the source of the issue.