How to Control Memory Growth When Using TensorFlow in Multi-Round Training?

Hello TensorFlow community,

I’m facing an issue related to memory growth when using TensorFlow for a multi-round training process. Specifically, I have a model training loop in which I generate training and evaluation data in each round, and my memory usage seems to keep growing, eventually causing out-of-memory errors. I’m trying to understand how I can effectively manage or release memory during these iterations.

Here is a simplified version of my code


# Define TensorFlow variables for training data
for num_round in range(1, 1 + total_num_round):
  
    train_data = generate_all_batch_s_path_samples(s_0_, net_list_c, batch_size, epochs_t + 1)
    eval_data = generate_all_batch_s_path_samples(s_0_, net_list_c, batch_size, eval_num_batch)

    # train and evaluate process

    # delete used data
    del train_data, eval_data
    gc.collect()

Issues I’m Facing:

  • The train_data and eval_data generated in each round occupy a lot of memory, and I cannot seem to release this memory effectively, leading to continuous memory growth.
  • I have tried several approaches to control memory usage:
    1. Using assign() instead of repeatedly defining train_data and eval_data .
    2. Using gc.collect() and del train_data, eval_data to free up memory, but these methods did not work.
  • The function generate_all_batch_s_path_samples is not decorated with tf.function because it uses threading for parallel computation, which makes it incompatible with tf.function .

Questions:

  1. Is there a more effective way to release memory between iterations, besides using tf.keras.backend.clear_session() ?
  2. Is there a recommended approach to managing memory growth in multi-round training scenarios like this?

Any advice, suggestions, or code examples would be greatly appreciated! Thank you all in advance for your help.

Context:

  • I’m using TensorFlow 2.16.0.
  • The data generation process (generate_all_batch_s_path_samples ) creates new tensors for training and evaluation in each round.

Thanks again for your support!