I am trying to automate the (recursively) restart of a finished deep-learning training session in TensorFlow. Currently, to restart I am manually restarting my kernel and re-running the training code.
Questions:
-
I understand that “when training deep learning models, the model’s parameters, activations, and gradients are stored in the GPU memory.” How would I clear the GPU memory without the need to manually restart my kernel?
-
When I automate the restart of model training, do I need to restart from the very beginning (importing libraries + data preprocessing) OR can I just restart from where I start to build and fit the model?
-
How would I implement this?
Thanks in advance!
Comment: This is how I call, compile, fit, and save the model.
# Get model
def get_model():
return build_model(input_shape, n_classes)
uNet_model = get_model()
# Compile Model
uNet_model.compile(optimizer= tf.optimizers.Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
# Print Model Summary
uNet_model.summary()
# Fit Model
# This is for a one-hot coded model: non-sparse
history = uNet_model.fit(train_rgb_input, train_mask_categorical,
batch_size=1,
epochs=1000,
validation_data=(val_rgb_input, val_mask_categorical),
# class_weight=class_weights,
verbose=1, shuffle=True)
# Save model
uNet_model.save("xxxx.hdf5")