Different Results for model.evaluate() compared to model()

Zephaniah_Connell · October 21, 2021, 3:29am

Hi. I have trained a MobileNets model and in the same code used the model.evaluate() on a set of test data to determine its performance. This test is indicating nearly 97% accuracy. Here is the code that performs this.

import os
import tensorflow.keras as keras
from tensorflow.keras.applications import MobileNet
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import load_model
from tensorflow.keras.callbacks import ModelCheckpoint

image_size_y = 1056 # The height of one input image
image_size_x = 1920 # The width of one input image

Choose a width multiplier which changes the number of filters per layer

depth_mul = 1.0/8.0

Set input shape for color images

shape = (image_size_y, image_size_x, 3)

Import the MobileNet model and set input dimensions and hyperparameters.

model = MobileNet(input_shape=shape, alpha=depth_mul, weights=None, classes=2)

Setting up the data directory paths

BaseDir = os.path.join(‘path’,‘to’,‘directory’,‘containing’,‘data’)

train_dir = os.path.join(BaseDir,‘train’)
val_dir = os.path.join(BaseDir,‘val’)
test_dir = os.path.join(BaseDir,‘test’)

train_positive_dir = os.path.join(train_dir,‘positive’)
train_negative_dir = os.path.join(train_dir,‘negative’)

val_positive_dir = os.path.join(val_dir,‘positive’)
val_negative_dir = os.path.join(val_dir,‘negative’)

test_positive_dir = os.path.join(test_dir,‘positive’)
test_negative_dir = os.path.join(test_dir,‘negative’)

Define desired Batch Size

batchsize = 32

Only use data augmentation that generate images that could reasonably occur in real-world situation (just scale brightness a bit)

train_datagen = ImageDataGenerator(
rescale= 1./255,
brightness_range=[0.9,1.1]
)
valid_datagen = ImageDataGenerator(rescale = 1./255)
test_datagen = ImageDataGenerator(rescale = 1./255)

Create Data Generators for each group of data

train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(image_size_x,image_size_y),
batch_size=batchsize,
class_mode=‘categorical’
)

validation_generator = valid_datagen.flow_from_directory(
val_dir,
target_size=(image_size_x,image_size_y),
batch_size=batchsize,
class_mode=‘categorical’
)

test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(image_size_x,image_size_y),
batch_size=batchsize,
class_mode=‘categorical’
)

Compile the model for training

model.compile(
loss=‘categorical_crossentropy’,
optimizer=‘rmsprop’,
metrics = [‘accuracy’]
)

Save the model at every epoch, overwriting each time, so the final version after the last epoch will remain and can be tested

finalNetwork = os.path.join(‘path’,‘to’,‘MobileNetsModel.h5’)
mcf = ModelCheckpoint(finalNetwork)

Train the network

history = model.fit(
train_generator,
steps_per_epoch = 40646 // batchsize,
epochs = 20,
validation_data = validation_generator,
validation_steps = 5080 // batchsize,
callbacks = [mcf]
)

Evaluation on test data of the model after the final epoch of training

saved_model = load_model(finalNetwork)
_,test_acc = saved_model.evaluate(test_generator,verbose = 0)
print(“Final Model Accuracy = %.1f%%” % (100.0 *test_acc))

keras.backend.clear_session()

And then I created another piece of code to actually use the trained model, but it doesn’t seem to be working. I’m getting nearly 50% true positives and 50% false positives, so only 50% accuracy. Here is that code. Am I performing the inferences wrong in this code? Am I not saving or loading my model properly? Please help!

import os
from matplotlib import image
import tensorflow as tf
from tensorflow.keras.models import load_model

Load a model that was trained and saved

model = load_model(os.path.join(‘path’,‘to’,‘MobileNetsModel.h5’))

Set the directory containing the test images

datadir = os.path.join(‘directory’,‘containing’,‘jpgs’)

Get the filenames of all the test images

imgNames = os.listdir(datadir)

Make inferences using the provided model

for imgName in imgNames:
# Get the image
img = image.imread(os.path.join(datadir,imgName))

# Make an inference
input = tf.convert_to_tensor(img)
input = tf.image.resize(input,(1056,1920))
input = input[None,:,:,:]
input = input/255.0
output = model(input)
prob_pos = output.numpy()[0,0]*100
prob_neg = output.numpy()[0,1]*100

# Categorize inferences and output to console
if prob_pos >= prob_neg:
    print(imgName,' is positive')
else:
    print(imgName,' is negative')

lgusm · October 21, 2021, 4:32pm

Hi,

I tried to read all the code but I got lost (maybe I need to sleep a little bit more )

can you try your data adapting this colab: Retraining an Image Classifier | TensorFlow Hub
?

Zephaniah_Connell · October 21, 2021, 5:30pm

I modified the post getting rid of any extraneous code. Could you maybe look through it again? I checked out that link, and as far as I could tell I’m doing the same thing. I feel like I’m missing something.

Zephaniah_Connell · October 21, 2021, 11:07pm

Is it possibly because I have used jpg file format for my images?

lgusm · October 23, 2021, 10:11am

One thing you could do is try to visualize some of the images from the train/evaluate/test data pipeline.

You’re using some very big images with a network that usually word on smaller images. The resize might be changing the image too much.

Sayak_Paul · October 23, 2021, 10:32am

I didn’t look into your code but a major difference between model.evaluate() and model() is that if you don’t run model(..., training=False) (where ... refers to the inputs) then the layers are not going to run in inference mode which is not ideal for layers like Dropout, BatchNorm, etc.

Sayak_Paul · October 23, 2021, 10:34am

Also, @fchollet explains the difference between model.predict() and model(...) in his book:

Zephaniah_Connell · October 24, 2021, 5:21am

I visualized the data generator images and the resolutions were inverted (squished into portrait instead of landscape). I think the fit function automatically then rotated them to match the defined input size for the network. But then the model() operation doesn’t automatically rotate an input for you. So I swapped the x and y dimensions of the data generators. I will update this post after training and trying model() again after this change.

Zephaniah_Connell · October 24, 2021, 5:25am

I don’t think this was the issue, but this was helpful. I will include training=False in my code. Thank you.

Zephaniah_Connell · November 1, 2021, 4:11am

I have confirmed that the dimensions of images in my data generators were flipped. It appears that the fit() and evaluate() functions will automatically rotate images to fit the input of a model for you, however calling the model directly on an input will not. After fixing the order of my dimensions and retraining, calling the model directly gives me the same accuracy as using evaluate(). Thank you, everyone, for your help.

Topic		Replies	Views
TF model performs worse on the training images during testing General Discussion models , keras , help_request	1	768	September 14, 2023
ModelCheckpoint saves a model that only returns ones ( Keras ) General Discussion models , help_request	6	1067	March 21, 2022
Lower-than-expected accuracies of pretrained MobileNet models General Discussion models , keras	1	448	September 5, 2022
Model.evaluate() does not yield the same accuracy as computing it manually using a for-loop General Discussion models , keras , help_request	1	1459	November 2, 2023
Discrepancy between results reported by TensorFlow model.evaluate and model.predict General Discussion api , keras , model	0	1071	August 1, 2022

Different Results for model.evaluate() compared to model()

Choose a width multiplier which changes the number of filters per layer

Set input shape for color images

Import the MobileNet model and set input dimensions and hyperparameters.

Setting up the data directory paths

Define desired Batch Size

Only use data augmentation that generate images that could reasonably occur in real-world situation (just scale brightness a bit)

Create Data Generators for each group of data

Compile the model for training

Save the model at every epoch, overwriting each time, so the final version after the last epoch will remain and can be tested

Train the network

Evaluation on test data of the model after the final epoch of training

Load a model that was trained and saved

Set the directory containing the test images

Get the filenames of all the test images

Make inferences using the provided model

Related topics