GPU Support w/ Windows 11, WSL2, & PyCharm

,

I am a student researcher trying to run a CNN model and when my model goes to train, my Jupyter notebook code block hangs after mapping the dataset using the CPU. Using a native Windows CPU-only environment, the code runs fine. I downloaded the tested TF2.16.1 build for Ubuntu using Anaconda and ported the environment into PyCharm. I have an MSI GF63 Thin 11UD device running this code and debug logging set. My device has an RTX 3050 NVIDIA GPU and I have used nvidia-smi and it shows no processes being ran on it.

I also reformated my device and installed fresh drivers and everything to try and fix the problem but I honestly have no clue what else to try or what is causing this. This issue extends to other model architectures and a completely different issue when trying to run RNNs. However, simple Dense networks run just fine.

I have been at this problem for 2 months, spent hours with a professor and ChatGPT and even a senior DevOps engineer to try and figure out what could be going on, so any help is appreciated!

Take this direct solution for your CNN training issue:

export TF_GPU_ALLOCATOR=cuda_malloc_async
export CUDA_VISIBLE_DEVICES=0

Then modify your code to explicitly set memory growth:

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
if physical_devices:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

This configuration should resolve the hanging issue by properly managing GPU memory allocation for your RTX 3050. For best results, also ensure you’re using CUDA 11.8+ and cuDNN 8.6+ which are optimized for your GPU model. If this does not help, just let me know.

Hi KRows,

I have already tried the second code block in the past, would you mind explaining what the 1st one does? I tried adding it to my Jupyter notebook and it doesn’t seem to be Python-readable.

I am running CUDA 12.3 and cuDNN 8.9.7. Your codeblock gives the same behavior and I have left it to run to see if anything would change. It can run for 8+ hrs without getting past this FlatMapDataset step

Thanks!

Those are environment variables that need to be set in your terminal/shell before launching Jupyter, not directly in the notebook. Here’s how to set them:

export TF_GPU_ALLOCATOR=cuda_malloc_async

This command tells TensorFlow to use an asynchronous memory allocator for better GPU memory management.

For Windows/WSL2, you can add these to your ~/.bashrc file for persistence, or set them before starting Jupyter:

jupyter notebook

The long FlatMapDataset processing suggests a potential data pipeline bottleneck - consider checking your dataset preprocessing steps and batch size.

Hey KRows,

I tried your solution, adding the 2 lines to my bashrc script and I have reduced the batch size down to 2 and on the dataset creation, using image_dataset_from_directory, I set it to 8. No change in behavior except for 1 run with a larger training batch size and it only got to do the EagerConst operation twice before hanging again.

Any ideas how to proceed? Here is the dataset creation after getting a tf.data.Dataset object:

# Function to extract features (X) and labels (y) from a dataset
def dataset_to_numpy(dataset):
    images = []
    labels = []
    for image_batch, label_batch in dataset:
        images.append(image_batch.numpy())  # Convert image tensors to NumPy arrays
        print(f"images with labels {label_batch} appended") # printing the process for the sake of sanity
        labels.append(label_batch.numpy())  # Convert label tensors to NumPy arrays
    return np.concatenate(images), np.concatenate(labels)

# Extract training data
X_train, y_train = dataset_to_numpy(training_data)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3, random_state=42)

# Extract test data
X_test, y_test = dataset_to_numpy(test_data)

training_data and test_data are the Dataset objects. Then once the datasets have been stored in numpy arrays, I normalize the values in the dataset since it’s pixel values.

#ie 
X_train = X_train.astype('float32')/ 255.0

Thanks!

I figured out that I needed to cache the dataset and convert it back to a tf.data.Dataset object! Problem resolved. Thanks for the insights.