Does ImageDataGenerator apply data augmentation to validation data if validation_split is specified?

Nithin_A_R · June 27, 2021, 2:11pm

Hi, I was going through the tutorials at tf.keras.preprocessing.image.ImageDataGenerator | TensorFlow v2.16.1
and Data augmentation | TensorFlow Core when I came across this doubt.
If I have a training directory with some images and I used ImageDataGenerator to augment the data with a validation_split = 0.2, as shown below.

train_datagen = keras.preprocessing.image.ImageDataGenerator(
rescale=1./255, width_shift_range=0.2,
shear_range=0.2, height_shift_range = 0.2,
zoom_range=0.2, validation_split = 0.2,
horizontal_flip=True)
test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

train_ds = train_datagen.flow_from_directory(
train_dir, seed = 42,
target_size= img_size, subset = ‘training’,
batch_size=32)
valid_ds = train_datagen.flow_from_directory(
train_dir, seed = 42,
target_size= img_size, subset = ‘validation’,
batch_size=32)

test_ds = test_datagen.flow_from_directory(
test_dir, seed = 42,
target_size= img_size,
batch_size=32)

my question is this:
Does the image augmentation applies to the validation_ds by default ?. If so it wouldn’t it create more bias towards the original training data? (as mentioned in Data augmentation | TensorFlow Core We should not augment the validation data.)

What if the validation_split argument was provided in the model.fit() method instead? does it mean that the validation split would have applied in the augmented training data?

Bhack · June 28, 2021, 10:41am

It was just closed 4 days ago

Check Split train data into training and validation when using ImageDataGenerator and model.fit_generator · Issue #5862 · keras-team/keras · GitHub

Nithin_A_R · June 28, 2021, 2:33pm

Thanks a lot
So, In one approach it says to create different ImageDataGenerators for validation and training subsets while keeping a constant seed value. It works!

Topic		Replies	Views
Ensure all original images are seen when using augmentation General Discussion datasets , keras	1	394	August 22, 2023
How to augment training data 'on the fly' with multiclass segmentation using Keras General Discussion keras , computer_vision , help_request	1	820	January 20, 2022
Proper use of Keras ImageDataGenerator: Create Masks for Segmentation and sample_weight parameter Keras datasets , help_request	2	4872	September 8, 2021
Data Leakage - image_dataset_from_directory() General Discussion data_validation	2	388	June 17, 2024
Augmenting images? General Discussion api , models , keras	1	484	January 13, 2023

Does ImageDataGenerator apply data augmentation to validation data if validation_split is specified?

Related topics