Memory issue when start fit a model

Alexander_Tov · July 8, 2024, 1:09am

Hello.
I have datasets with 48k train and 12k validation images. Image size ±1mb
Model some kind of UNET.
My machine is i5/64Gb RAM/NVIDIA GeForce RTX 3080 Ti
And I have a problem with memory - if my batch size great than 2! I have errors about memory.
I load datasets with tf.keras.preprocessing.image_dataset_from_directory

I use

gpus = tf.config.experimental.list_physical_devices(‘GPU’)
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)

Is this a normal behaviour for my dataset size? Or I use a bad pipeline with my data?

I only could train my model on 4 A100 GPUs. (using
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope(): )
In this case I can use batch = 32

Aniket_Dubey · July 18, 2024, 9:38am

Hi @Alexander_Tov ,

Your issue with memory when training a UNET model with a batch size greater than 2 on your local machine may be due to the large size of the images (±1MB) and the architecture of the UNET model, which is known for being memory-intensive due to its multiple layers and operations .
Here are few suggestions addressing potential issues ,

Optimize Data Pipeline:
Use the tf.data.Dataset API for more control and optimization. Enable data prefetching and parallel data loading.
Reduce Batch Size: Stick with a smaller batch size if memory is constrained. A batch size of 2 might be necessary given your current setup.
Model Optimization: Reduce the number of layers or filters in your UNET model. Use mixed precision training to reduce memory usage.
Memory Growth: Ensure that TensorFlow is configured to allow memory growth as you’ve done.

Hope this Helps ,

Thank You .

Topic		Replies	Views
CUDA error: out of memory (CUDA_ERROR_OUT_OF_MEMORY) General Discussion models , datasets , gpu , tensorflow	4	1400	November 11, 2023
Running out of memory while performing model training General Discussion models , datasets , data-generator	2	1073	January 28, 2024
Out of memory issue with small model (500k parameters) and small to medium batch sizes General Discussion memory , gpu , tf-model	2	299	February 2, 2024
Tensorflow project problem TensorFlow models , model-optimization , keras , gpu , help_request	2	1055	September 28, 2023
How to manage gpu memory allocation properly General Discussion models , gpu , help_request	1	2126	November 3, 2021

Memory issue when start fit a model

Related topics