I am using tensorflow datasets for image data, created with image_dataset_from_directory, and then want to apply augmentation to train when fit is called:
from tensorflow.keras.layers import RandomFlip
#
train_root = \
(data_path + train_dir + '/train/')
train = image_dataset_from_directory(
train_root,
batch_size = batch_size,
shuffle = False,
labels = 'inferred',
label_mode = 'categorical',
color_mode = 'grayscale',
image_size = (resize, resize),
seed = random_seed)
#
val = image_dataset_from_directory(
val_root,
batch_size = batch_size,
shuffle = False,
labels = 'inferred',
label_mode = 'categorical',
color_mode = 'grayscale',
image_size = (resize, resize),
seed = random_seed)
#
data_augmentation = tf.keras.Sequential([
RandomFlip("horizontal_and_vertical")])
history = model.fit(train.map(lambda x, y:
(data_augmentation(x), y)).
shuffle(buffer_size = train.cardinality() * batch_size,
seed = random_seed,
reshuffle_each_iteration = False),
validation_data = val,
callbacks = callbacks,
batch_size = batch_size,
epochs = epochs)
If I understand what happens when I map data_augmentation to the train dataset, some but not all the images are affected (flipped, in my case). As this is random, it occurred to me that it’s possible even after a large number of epochs that not all the original images will have been seen by the model, as they are being replaced by the flipped ones. I verified that when I used the above code, the number of batches remains the same, so I believe it is flipping images in place, not adding to the train data. Is there any way to ensure all the original images are seen: