After following the transfer learning tutorial on Tensorflow’s site, I have a question about how model.evaluate()
works in comparison to calculating accuracy by hand.
At the very end, after fine-tuning, in the Evaluation and prediction section, we use model.evaluate()
to calculate the accuracy on the test set as follows:
loss, accuracy = model.evaluate(test_dataset)
print('Test accuracy :', accuracy)
6/6 [==============================] - 2s 217ms/step - loss: 0.0516 - accuracy: 0.9740
Test accuracy : 0.9739583134651184
Next, we generate predictions manually from one batch of images from the test set as part of a visualization exercise:
# Apply a sigmoid since our model returns logits
predictions = tf.nn.sigmoid(predictions)
predictions = tf.where(predictions < 0.5, 0, 1)
However, it’s also possible to extend this functionality to calculate predictions across the entire test set and compare them to the actual values to yield an average accuracy:
all_acc=tf.zeros([], tf.int32) #initialize array to hold all accuracy indicators (single element)
for image_batch, label_batch in test_dataset.as_numpy_iterator():
predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
predictions = tf.nn.sigmoid(predictions) #apply sigmoid activation function to transform logits to [0,1]
predictions = tf.where(predictions < 0.5, 0, 1) #round down or up accordingly since it's a binary classifier
accuracy = tf.where(tf.equal(predictions,label_batch),1,0) #correct is 1 and incorrect is 0
all_acc = tf.experimental.numpy.append(all_acc, accuracy)
all_acc = all_acc[1:] #drop first placeholder element
avg_acc = tf.reduce_mean(tf.dtypes.cast(all_acc, tf.float16))
print('My Accuracy:', avg_acc.numpy())
My Accuracy: 0.974
Now, if model.evaluate()
generates predictions by applying a sigmoid to the logit model outputs and using a threshold of 0.5 like the tutorial suggests, my manually-calculated accuracy should equal the accuracy output of Tensorflow’s model.evaluate()
function. This is indeed the case for the tutorial. My Accuracy: 0.974 = accuracy from model.evaluate()
function. However, when I try this same code with a model trained using the same convolutional base as the tutorial, but different Gabor images (not cats & dogs like the tutorial), my accuracy no longer equals the model.evaluate()
accuracy:
current_set = set17 #define set to process.
all_acc=tf.zeros([], tf.float64) #initialize array to hold all accuracy indicators (single element)
loss, acc = model.evaluate(current_set) #now test the model's performance on the test set
for image_batch, label_batch in current_set.as_numpy_iterator():
predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
predictions = tf.nn.sigmoid(predictions) #apply sigmoid activation function to transform logits to [0,1]
predictions = tf.where(predictions < 0.5, 0, 1) #round down or up accordingly since it's a binary classifier
accuracy = tf.where(tf.equal(predictions,label_batch),1,0) #correct is 1 and incorrect is 0
all_acc = tf.experimental.numpy.append(all_acc, accuracy)
all_acc = all_acc[1:] #drop first placeholder element
avg_acc = tf.reduce_mean(all_acc)
print('My Accuracy:', avg_acc.numpy())
print('Tf Accuracy:', acc)
My Accuracy: 0.832
Tf Accuracy: 0.675000011920929
Does anyone know why there would be a discrepancy? Does the model.evaluate() not use a sigmoid? Or does it use a different threshold than 0.5? Or perhaps it’s something else I’m not considering? Please note, my new model was trained using Gabor images, which are different than the cats and dogs from the tutorial, but the code was the same.
Thank you in advance for any insight!