Hello.
I have datasets with 48k train and 12k validation images. Image size ±1mb
Model some kind of UNET.
My machine is i5/64Gb RAM/NVIDIA GeForce RTX 3080 Ti
And I have a problem with memory - if my batch size great than 2! I have errors about memory.
I load datasets with tf.keras.preprocessing.image_dataset_from_directory
I use
gpus = tf.config.experimental.list_physical_devices(‘GPU’)
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
Is this a normal behaviour for my dataset size? Or I use a bad pipeline with my data?
I only could train my model on 4 A100 GPUs. (using
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope(): )
In this case I can use batch = 32
Hi @Alexander_Tov ,
Your issue with memory when training a UNET model with a batch size greater than 2 on your local machine may be due to the large size of the images (±1MB) and the architecture of the UNET model, which is known for being memory-intensive due to its multiple layers and operations .
Here are few suggestions addressing potential issues ,
- Optimize Data Pipeline:
- Use the
tf.data.Dataset
API for more control and optimization. Enable data prefetching and parallel data loading.
- Reduce Batch Size: Stick with a smaller batch size if memory is constrained. A batch size of 2 might be necessary given your current setup.
- Model Optimization: Reduce the number of layers or filters in your UNET model. Use mixed precision training to reduce memory usage.
- Memory Growth: Ensure that TensorFlow is configured to allow memory growth as you’ve done.
Hope this Helps ,
Thank You .