GPU usage dips after each epoch to 0%

Dennis_Schroder · December 8, 2021, 11:57am

Hi everyone,
when training my model using model.fit() and using tf.data for my training and validation data the GPU usage dips to 0% after each epoch even though I am using the prefetch method for tf.data.Dataset.

Have you experienced something similar?
Sadly I cannot provide any code.

Thank you in advance.

Mark_Daoust · December 8, 2021, 7:31pm

My first two guesses would be:

The dataset needs to refill the shuffle buffer after each epoch, like model.fit(ds.shuffle(buffer_size).repeat()) instead of model.fit(ds.repeat().shuffle(), steps_per_epoch=N)
Maybe something with the evaluation logic?

Dennis_Schroder · December 10, 2021, 8:48am

Thank you for your reply.
Currently I am using

model.fit(train_data, epochs=self.epochs, validation_data=val_data, verbose=1)

Where train_data is a tf.data.Dataset with

train_data = tf.data.Dataset.from_tensor_slices((train_ivs, train_logr, train_metric))
train_data = train_data.shuffle(buffer_size=train_ivs.shape[0], seed=self.seed, 
                                reshuffle_each_iteration=True)
train_data = train_data.batch(self.batch_size)
train_data = train_data.prefetch(tf.data.AUTOTUNE)

The evaluation step does not seem to cause any problem either.

Topic		Replies	Views
Large dataset with ILSVRC2012 GPU is waiting General Discussion tffunction , gpu , tf-dataset	0	137	March 22, 2024
Simple test on GPU, max load General Discussion gpu , load-test , datasets , tf-c-api	1	487	October 31, 2023
Why TF waiting a lot of time before start work? General Discussion gpu	1	430	February 28, 2024
Custom training loop is not utilizing GPU for training TensorFlow tfdataset , gpu	1	22	December 26, 2024
Getting memory error when training a larger dataset on the GPU General Discussion gpu , datasets	15	13337	December 15, 2023

GPU usage dips after each epoch to 0%

Related topics