Hi, I was going through the tutorials at tf.keras.preprocessing.image.ImageDataGenerator | TensorFlow v2.16.1
and Data augmentation | TensorFlow Core when I came across this doubt.
If I have a training directory with some images and I used ImageDataGenerator to augment the data with a validation_split = 0.2, as shown below.
train_datagen = keras.preprocessing.image.ImageDataGenerator(
rescale=1./255, width_shift_range=0.2,
shear_range=0.2, height_shift_range = 0.2,
zoom_range=0.2, validation_split = 0.2,
horizontal_flip=True)
test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
train_ds = train_datagen.flow_from_directory(
train_dir, seed = 42,
target_size= img_size, subset = ‘training’,
batch_size=32)
valid_ds = train_datagen.flow_from_directory(
train_dir, seed = 42,
target_size= img_size, subset = ‘validation’,
batch_size=32)
test_ds = test_datagen.flow_from_directory(
test_dir, seed = 42,
target_size= img_size,
batch_size=32)
my question is this:
Does the image augmentation applies to the validation_ds by default ?. If so it wouldn’t it create more bias towards the original training data? (as mentioned in Data augmentation | TensorFlow Core We should not augment the validation data.)
What if the validation_split argument was provided in the model.fit() method instead? does it mean that the validation split would have applied in the augmented training data?