Custom training loop is not utilizing GPU for training

Preet_Sojitra · December 23, 2024, 8:00am

I am trying to train simple GAN model on fashion MNIST dataset but GPU is not utilized while training the model on both Kaggle and Colab.
Here’s the code:

batch_size = 128
dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)
dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)

codings_size = 30

generator = keras.models.Sequential([
    keras.layers.Input(shape=[codings_size]),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(150, activation="selu"),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])

discriminator = keras.models.Sequential([
    keras.layers.Input(shape=[28, 28]),
    keras.layers.Flatten(),
    keras.layers.Dense(150, activation="selu"),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(1, activation="sigmoid")
])

gan = keras.models.Sequential([generator, discriminator])

discriminator.compile(loss="binary_crossentropy", optimizer="rmsprop")
discriminator.trainable = False
gan.compile(loss="binary_crossentropy", optimizer="rmsprop")

def train_gan(gan, dataset, batch_size, codings_size, n_epochs=50):
    generator, discriminator = gan.layers
    for epoch in range(n_epochs):
        print("Epoch {}/{}".format(epoch + 1, n_epochs))  
        for X_batch in dataset:
            # phase-1: training the discriminator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            generated_images = generator(noise)
            X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)
            y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)
            discriminator.trainable = True
            discriminator.train_on_batch(X_fake_and_real, y1)
            
            # phase-2: training the generator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            y2 = tf.constant([[1.]] * batch_size)
            discriminator.trainable = False
            gan.train_on_batch(noise, y2)
        plot_multiple_images(generated_images, 8)
        plt.show()

train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)

Kiran_Sai_Ramineni · December 26, 2024, 9:24am

Hi @Preet_Sojitra, Instead of using buffer_size as 1 in prefetch, Could you please try to use prefetch(AUTOTUNE) by defining AUTOTUNE= tf.data.AUTOTUNE. Also try by increasing the batch_size. Thank You.

Topic		Replies	Views
Multi GPU and TensorFlow MirroredStrategy General Discussion distributed-training , help_request	1	653	October 4, 2024
GPU usage dips after each epoch to 0% General Discussion gpu , help_request	2	994	December 10, 2021
Why Deep Convolutional Neural Network GAN, trained using my step sequence, doesn't performs good as in the GFG DC-GAN Tutorial example? Even tho my code is similar in logic and resembles same to the one from GFG DC-GAN. General Discussion help_request	3	398	September 12, 2022
CUDA and cudnn error while training a pix-to-pix GAN using multi-gpu General Discussion distributed-training , gpu	1	952	February 27, 2023
Training speed of cnn model is too slow even after using google colab General Discussion models , gpu	2	673	November 16, 2023

Custom training loop is not utilizing GPU for training

Related topics