SparseTensor memory leak

Thersites · August 1, 2023, 1:15pm

Hello General,

(tensorflow 2.13 / tensorflow-metal 2.13 / OS X 13)
I am experiencing an apparent memory leak in a pipeline where the inputs to the first (dense) layer are large SparseTensors. As training data is read / passed to the model, memory use increases and the shell goes OOM after a few epochs, in proportion to batch size.

I read column-indices from dense tensors as TFRecords, and consume records with tf.data.TFRecordDataset with the two methods below.
The leak does not occur when I create SparseTensors in the method without the tensors read from record. They are otherwise identical to the ones generated from the training data, and are manipulated in the same way inside the method.
I parse the input tensors (list_a / list_b) as tf.io.FixedLenSequenceFeature, but parsing as tf.io.VarLenFeature has the same behavior.

The _to_sparse_tensor method is run in graph mode (because it is passed to a DataSet map), so one possible cause of the leak may be that the computation graph is being continually updated during training. However I cant see where this would happen. The arguments passed are all tensors of constant shape, and neither method pulls in the surrounding scope in obvious ways.

Does anyone have an idea where the problem may lie, or how I can debug it?

Thanks.

def _to_sparse_tensor(list_a, list_b, scalar_a, scalar_b):
    a_size = tf.constant(100000)
    b_size = tf.constant(50)
    row_indicies = tf.constant(0, shape=[6,], dtype=tf.int64)
    indicies = tf.stack([row_indicies, list_a], 1)
    values = tf.constant(1, shape=[6,], dtype=tf.int64)
    sparse_a = tf.sparse.SparseTensor(indices=indicies,
                                           values=values,
                                           dense_shape=[1, a_size])
    indicies = tf.stack([row_indicies, list_b], 1)
    sparse_b = tf.sparse.SparseTensor(indices=indicies,
                                                values=values,
                                                dense_shape=[1, b_size])
    in_onehot = tf.sparse.concat(1, [sparse_a, sparse_b])
    in_onehot = tf.sparse.reshape(in_onehot, [a_size+b_size,])
    return in_onehot, scalar_a, scalar_b

def _parse_function_list(example):
    features_dict = {"list_a": tf.io.FixedLenSequenceFeature([], dtype=tf.int64),
                     "list_b": tf.io.FixedLenSequenceFeature([], dtype=tf.int64),
                     "scalar_a": tf.io.FixedLenSequenceFeature([], dtype=tf.int64),
                     "scalar_b": tf.io.FixedLenSequenceFeature([], dtype=tf.int64)}

    context_dict = {}
    context, features = tf.io.parse_single_sequence_example(
        example,
        sequence_features=features_dict,
        context_features=context_dict
    )

    list_a, list_b, scalar_a, scalar_b = features['list_a'], \
        features['list_b'], \
        features['scalar_a'], \
        features['scalar_b']
    return list_a, list_b, scalar_a, scalar_b

Topic		Replies	Views
Tensorflow memory leak in loop TensorFlow keras , memory , gpu	1	702	January 2, 2024
Tensorflow memory leak during inference in loop General Discussion model-code , tfkeras	2	348	March 6, 2024
Memory Leak when calling fit() often JVM help_request	2	2470	May 25, 2022
Reference for tf.sparse.to_dense General Discussion help_request	1	336	March 21, 2024
Error on predict() - High memory usage in GPU: 1179.94 MB, most likely due to a memory leak TensorFlow tfjs , memory , tflite , gpu	3	1504	June 11, 2023

SparseTensor memory leak

Related topics