Automated Restarting of Training of Deep Learning Models in TensorFlow

Real_objCDT · January 21, 2024, 8:09pm

I am trying to automate the (recursively) restart of a finished deep-learning training session in TensorFlow. Currently, to restart I am manually restarting my kernel and re-running the training code.

Questions:

I understand that “when training deep learning models, the model’s parameters, activations, and gradients are stored in the GPU memory.” How would I clear the GPU memory without the need to manually restart my kernel?
When I automate the restart of model training, do I need to restart from the very beginning (importing libraries + data preprocessing) OR can I just restart from where I start to build and fit the model?
How would I implement this?

Thanks in advance!

Comment: This is how I call, compile, fit, and save the model.

    # Get model 
    def get_model():
        
        return build_model(input_shape, n_classes)
    
    uNet_model = get_model() 
    
    # Compile Model 
    
    uNet_model.compile(optimizer= tf.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
    
    # Print Model Summary
    uNet_model.summary()
    
    # Fit Model 
    
    # This is for a one-hot coded model: non-sparse 
    history = uNet_model.fit(train_rgb_input, train_mask_categorical, 
                              batch_size=1,
                              epochs=1000, 
                              validation_data=(val_rgb_input, val_mask_categorical), 
                              # class_weight=class_weights, 
                              verbose=1, shuffle=True) 
    
    # Save model
    uNet_model.save("xxxx.hdf5")

B1t_F3n1x · January 22, 2024, 1:06am

Real_objCDT:

history = uNet_model.fit(train_rgb_input, train_mask_categorical, 
                              batch_size=1,
                              epochs=1000, 
                              validation_data=(val_rgb_input, val_mask_categorical), 
                              # class_weight=class_weights, 
                              verbose=1, shuffle=True)

if you want to “retrain” model without recompiling call only these lines

Real_objCDT · January 22, 2024, 3:17am

But how would I “refresh” my GPU without restarting my python kernel?

Thanks

B1t_F3n1x · January 22, 2024, 9:37am

it has no logical sens, because if you clear memory all you trained data will be lost.
To avoid gpu memory overloading you can set this param

or try this

personally I’m using only cpu with intel mkl optimizations on some packages

Topic		Replies	Views
Keras Model Memory Leak Keras tfkeras	1	856	June 19, 2024
Clear the graph and free the GPU memory in Tensorflow 2 General Discussion models , keras , gpu , help_request	2	12548	October 27, 2021
How to load a TensorFlow model to retrain it without the optimizer states being reset? General Discussion models , help_request	3	840	October 27, 2023
How to Control Memory Growth When Using TensorFlow in Multi-Round Training? TensorFlow models , datasets	0	54	October 30, 2024
Regenerate dataset after n epochs General Discussion datasets	2	518	September 21, 2023

Automated Restarting of Training of Deep Learning Models in TensorFlow

Related topics