Batched evaluation of a model exhausts memory

gwiesenekker · January 3, 2024, 8:22am

Hi,

The following code that evaluates a model in batches rapidly exhausts my 128GB of RAM after a couple of batches:

BATCH_SIZE = 16384
delta = []
with tf.device("CPU"):
    dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BATCH_SIZE)
    dataset = dataset.map(unpackbits_tf)
    for dataset_features, dataset_labels in dataset:
        predict = model.predict(dataset_features, batch_size = BATCH_SIZE)
        diff = np.abs(predict - dataset_labels.numpy()) 
        delta.extend(diff.flatten())
        print(len(delta))

The map function expands 24 8-bit-packed integers into 192 0/1 integers. The model is:

model = tf.keras.Sequential([layers.Dense(192,activation="relu"),
                               layers.Dense(16,activation="relu"),
                               layers.Dense(16,activation="relu"),
                               layers.Dense(1,activation="sigmoid")])

and the test-set consists of 2M records. If I change the line:

predict = model.predict(dataset_features, batch_size = BATCH_SIZE)

to:

predict = np.array(model.predict(dataset_features, batch_size = BATCH_SIZE).flatten())

the code behaves as expected and all 123 batches finish within a couple of seconds.

What is the reason for the massive memory usage?

Regards,
GW

Kiran_Sai_Ramineni · January 3, 2024, 10:26am

Hi @gwiesenekker, Generally Numpy arrays are homogenous and contiguous, whereas the lists due to their flexibility need much more space and are not contiguous.

As you are using the extend method to add batch wise predictions to the delta list causing the list to grow significantly in size as more batches are processed, consuming large amounts of memory.

NumPy arrays are more memory-efficient than Python lists because they store data in a contiguous block of memory. So numpy array takes less memory. Thank You.

Topic		Replies	Views
Large memory usage while validating model with workaround General Discussion gpu	1	55	July 16, 2024
Tips for estimating the RAM usage of a tf.data.Dataset object General Discussion datasets	1	1042	April 4, 2023
OOM when calling model.predict() General Discussion models , datasets , help_request	2	3545	January 17, 2024
Model.evaluate() takes up a lot of memory General Discussion keras , memory , model , help_request	1	1097	May 18, 2023
Running out of memory while performing model training General Discussion models , datasets , data-generator	2	1098	January 28, 2024

Batched evaluation of a model exhausts memory

Related topics