Unstable training CNN model

In the beginning, I made a CNN model train to predict cats vs dogs.
it was unstable during training however I get 2000 photos for training and another 1000 validation data set so that thought the problem with my model and I load a pre-trained model resnet50 and made a fully connected layer with a softmax of 2 outputs.
and it is still unstable.
I changed the batch_size, epochs, and steps_per_epochs much time and it is still unstable.
what should I do and what is my fault

#Data base
from os.path import join as p
from os import getcwd as g
train = p(g(), 'train')
validation = p(g(), 'validation')
train_cat = p(train, 'cat')
train_dog = p(train, 'dog')
validation_cat = p(validation, 'cat')
validation_dog = p(validation, 'dog')

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import load_model, Sequential, Model
from tensorflow.keras.layers import Flatten, Dense, Dropout, Conv2D, MaxPooling2D, Input, GlobalMaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator as IDG
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

i = Input(shape=(224, 224, 3))
x = ResNet50(weights='imagenet', include_top=False)(i)
x = GlobalMaxPooling2D()(x)
x = Flatten()(x)
x = Dense(1024, activation='relu')(x)
x = Dense(512, activation='relu')(x)
x = Dense(2, activation='softmax')(x)
model = Model(inputs=i, outputs=x)

model.get_layer('resnet50').trainable = False
model.summary()

model.compile(optimizer='adam',loss='categorical_crossentropy', metrics=['accuracy'])

stop = EarlyStopping(monitor='val_loss', patience=100)
saving = ModelCheckpoint('resnet50.h5', save_weights_only=False, save_best_only=True)

#adjust dataset before training
train_datagen = IDG( rescale = 1.0/255.,
                     rotation_range=40,
                     width_shift_range=0.2,
                     height_shift_range=0.2,
                     shear_range=0.2,
                     zoom_range=0.2,
                     horizontal_flip=True,
                     fill_mode='nearest')
test_datagen  = IDG( rescale = 1.0/255.)
train_generator = train_datagen.flow_from_directory(train,
                                                    batch_size=1,
                                                    class_mode='categorical',
                                                    target_size=(224, 224))
validation_ganerator = test_datagen.flow_from_directory(validation,
                                                        batch_size=1,
                                                        class_mode='categorical',
                                                        target_size=(224, 224))
#Model training
history = model.fit(train_generator,
                    validation_data=validation_ganerator,
                    steps_per_epoch=1,
                    epochs=100,
                    validation_steps=50,
                    verbose=1,
                    callbacks=[stop, saving])

#show results
import numpy as np
import matplotlib.pyplot as plt
acc      = history.history['accuracy']
val_acc  = history.history['val_accuracy']
loss     = history.history['loss']
val_loss = history.history['val_loss']

epochs   = range(len(acc)) # Get number of epochs
plt.plot  ( epochs,     acc )
plt.plot  ( epochs, val_acc )
plt.title ('Training and validation accuracy')
plt.figure()
plt.plot  ( epochs,     loss )
plt.plot  ( epochs, val_loss )
plt.title ('Training and validation loss')
plt.show()

Why are you using a batch size of 1 and steps_per_epoch=1? It means that every epoch you train your model on exactly one train image and evaluate it on 50 validation images (since validation_steps=50).

2 Likes

I used before batch_size = 20 and step_per_epochs = 100 but it’s the same unstable so that i changed it to 1 and 1 for trying another way
And the same problem i have.
Any other ideas ?

What do you mean with “Is unstable”?

1 Like

https://files.fm/thumb_show.php?i=hmwcq8hk7
https://files.fm/thumb_show.php?i=2t9bas9w4

Have you tried to save in saved model format instead of h5?
Also just to check that your datagen pipeline is ok try to check if you can overfit the train+validation and use a more consistent batch _size.

1 Like

using save model format instead of h5 and batch_size of 20 photo, steps_per_epochs = 100 to get all 2000 photos trained and I train it for 10 epochs it takes about 1 hour to train and I got this
https://fv2-5.failiem.lv/thumb_show.php?i=ue58sz2f3&download_checksum=14cc5cf7ddebcb68554858d421da881ec7c64e71&download_timestamp=1630702313
https://fv2-5.failiem.lv/thumb_show.php?i=ew2hatesr&download_checksum=5aea3c29c4061a1b4e4dc52ea493135ed8860bd4&download_timestamp=1630702221
:confused:

Visually Inspect your dataset augmentation with the related labels and check if you can overfit with a fine-tuning step:

1 Like