TF.dataset.cache(path) still uses memory despite the cache file path is given in tf2.5-2.8

Yusen_Zhan · March 10, 2022, 3:09am

Hi,
I came across a weird problem when I read TFrecords files from S3 through tf.dataset and cached them to my local path. Here is my reading code

    filenames=['s3s:path1', ''s3s:path2']
    dataset = tf.data.TFRecordDataset(filenames, compression_type="GZIP")
    parsed_dataset = (
        dataset.batch(batch_size, num_parallel_calls=tf.data.AUTOTUNE)
        .map(decode, num_parallel_calls=tf.data.AUTOTUNE)
        .cache(cache_file_path)
        .prefetch(tf.data.AUTOTUNE)
    )

It’s very strange that cache() still uses the internal memory which results in OOM. Here is the memory usage I printed via callback during training.

2022-03-08T22:19:40.154191003Z ...Training: end of batch 15700; got log keys: ['loss', 'copc', 'auc']
2022-03-08T22:19:40.159188560Z totalmemor: 59.958843GB
2022-03-08T22:19:40.159223737Z availablememory: 8.418320GB
2022-03-08T22:19:40.159250296Z usedmemory: 50.959393GB
2022-03-08T22:19:40.159257814Z percentof used memory: 86.000000
2022-03-08T22:19:40.159263710Z freememory:1.072124GB
2022-03-08T22:19:47.752077011Z Tue Mar  8 22:19:47 UTC 2022	job-submitter:	job run error: signal: killed

I have tested the code on TF2.3 which has no such an issue, but TF2.5 and onwards have such an OOM issue. I am not sure whether or not this is bug or configuration problem. Could anyone help to answer or give some clues about this problem?

Roshan · June 9, 2022, 2:23am

Did you find the issue? Same happens on 2.9

ctargon · January 5, 2023, 7:11pm

i wonder if this is related:

https://github.com/tensorflow/tensorflow/issues/56177

Topic		Replies	Views
OOM each time reinitialize input pipeline General Discussion datasets , tfdata , help_request	2	902	June 23, 2021
TFRecordDataset auto cache General Discussion datasets , tfdata , help_request	1	964	July 29, 2022
Dataset memory footprint keeps growing General Discussion api , keras , tfdata	5	1389	September 25, 2023
SparseTensor memory leak General Discussion tensorflow	0	305	August 1, 2023
CUDA error: out of memory (CUDA_ERROR_OUT_OF_MEMORY) General Discussion models , datasets , gpu , tensorflow	4	1427	November 11, 2023

TF.dataset.cache(path) still uses memory despite the cache file path is given in tf2.5-2.8

Related topics