I am a student researcher trying to run a CNN model and when my model goes to train, my Jupyter notebook code block hangs after mapping the dataset using the CPU. Using a native Windows CPU-only environment, the code runs fine. I downloaded the tested TF2.16.1 build for Ubuntu using Anaconda and ported the environment into PyCharm. I have an MSI GF63 Thin 11UD device running this code and debug logging set. My device has an RTX 3050 NVIDIA GPU and I have used nvidia-smi and it shows no processes being ran on it.
I also reformated my device and installed fresh drivers and everything to try and fix the problem but I honestly have no clue what else to try or what is causing this. This issue extends to other model architectures and a completely different issue when trying to run RNNs. However, simple Dense networks run just fine.
I have been at this problem for 2 months, spent hours with a professor and ChatGPT and even a senior DevOps engineer to try and figure out what could be going on, so any help is appreciated!
Then modify your code to explicitly set memory growth:
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
if physical_devices:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
This configuration should resolve the hanging issue by properly managing GPU memory allocation for your RTX 3050. For best results, also ensure you’re using CUDA 11.8+ and cuDNN 8.6+ which are optimized for your GPU model. If this does not help, just let me know.
I have already tried the second code block in the past, would you mind explaining what the 1st one does? I tried adding it to my Jupyter notebook and it doesn’t seem to be Python-readable.
I am running CUDA 12.3 and cuDNN 8.9.7. Your codeblock gives the same behavior and I have left it to run to see if anything would change. It can run for 8+ hrs without getting past this FlatMapDataset step
Those are environment variables that need to be set in your terminal/shell before launching Jupyter, not directly in the notebook. Here’s how to set them:
export TF_GPU_ALLOCATOR=cuda_malloc_async
This command tells TensorFlow to use an asynchronous memory allocator for better GPU memory management.
For Windows/WSL2, you can add these to your ~/.bashrc file for persistence, or set them before starting Jupyter:
jupyter notebook
The long FlatMapDataset processing suggests a potential data pipeline bottleneck - consider checking your dataset preprocessing steps and batch size.
I tried your solution, adding the 2 lines to my bashrc script and I have reduced the batch size down to 2 and on the dataset creation, using image_dataset_from_directory, I set it to 8. No change in behavior except for 1 run with a larger training batch size and it only got to do the EagerConst operation twice before hanging again.
Any ideas how to proceed? Here is the dataset creation after getting a tf.data.Dataset object:
# Function to extract features (X) and labels (y) from a dataset
def dataset_to_numpy(dataset):
images = []
labels = []
for image_batch, label_batch in dataset:
images.append(image_batch.numpy()) # Convert image tensors to NumPy arrays
print(f"images with labels {label_batch} appended") # printing the process for the sake of sanity
labels.append(label_batch.numpy()) # Convert label tensors to NumPy arrays
return np.concatenate(images), np.concatenate(labels)
# Extract training data
X_train, y_train = dataset_to_numpy(training_data)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3, random_state=42)
# Extract test data
X_test, y_test = dataset_to_numpy(test_data)
training_data and test_data are the Dataset objects. Then once the datasets have been stored in numpy arrays, I normalize the values in the dataset since it’s pixel values.