Memory Management in TFDS

Goktug_Guvercin · April 21, 2023, 2:25am

Hello Tensorflow’s Community;

While I was using TFDS module, I was confused about its memory management. I have the following small code block:

train_ds = tfds.load("cifar10", split="train")
test_ds = tfds.load("cifar10", split="test")

train_ds = train_ds.repeat(num_epochs).shuffle(1024)
train_ds = train_ds.batch(batch_size, drop_remainder=True).prefetch(1)

for sample in tfds.as_numpy(train_ds):
    image, label = sample['image'], sample['label']
    print(image.shape)

When we call tfds.load(.) function, we create a builder, download the data, prepare it, and return it as tf.data.Dataset as far as I know. What I am wondering is whether the samples (images and labels) are also loaded into RAM when we use tfds.load() ? If not in the RAM now, when will be it loaded into RAM ? Is it loaded during batching and prefetching or during iteration ?

chunduriv · April 21, 2023, 7:34am

@Goktug_Guvercin,

Welcome to the Tensorflow Forum!

No, the samples (images and labels) are not loaded into memory during tfds.load(). Actually there are loaded into memory during iteration of the train_ds.

The prefetch() method is used to preload batches of data while the model is processing the current batch.

Thank you!

Topic		Replies	Views
Problems with training a model on a dataset that doesn't fit into RAM memory General Discussion python , tfcore , tensorflow-data , tf_function	3	886	November 29, 2023
tf.data.Dataset with tf.distribute General Discussion datasets , distributed-training , gpu	1	476	October 4, 2024
Dataset memory footprint keeps growing General Discussion api , keras , tfdata	5	1269	September 25, 2023
Recommended way to save/load data to/from disk to tf.data.Dataset General Discussion tfdata	7	4027	July 19, 2023
Confusion regarding how tf.keras.preprocessing.image_dataset_from_directory works General Discussion api , keras , help_request	2	2044	October 10, 2022

Memory Management in TFDS

Related topics