GPU installation on Ubuntu

aprentic · May 22, 2024, 4:47am

I’m trying to install TF on Ubuntu 24.02 using

nvida-smi

shows:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080        Off |   00000000:01:00.0  On |                  N/A |
|  0%   51C    P8             26W /  225W |     986MiB /   8192MiB |     39%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2716      G   /usr/lib/xorg/Xorg                            225MiB |
|    0   N/A  N/A      3031      G   /usr/bin/gnome-shell                           72MiB |
|    0   N/A  N/A      3964      G   ...irefox/4259/usr/lib/firefox/firefox          0MiB |
|    0   N/A  N/A      6726      G   /usr/bin/gnome-text-editor                      9MiB |
|    0   N/A  N/A      9168      G   ...erProcess --variations-seed-version         79MiB |
+-----------------------------------------------------------------------------------------+

pip install tensorflow[and-cuda] exits with:

Successfully installed tensorflow-2.16.1

But

python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

returns:

2024-05-21 17:09:20.348427: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-21 17:09:20.965066: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-05-21 17:09:21.653899: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-21 17:09:21.702216: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

From all this I’m guessing that TF with GPU support is installed but I’m missing some step that lets Python know how to do it.

Mohan_Krishna_G_R · May 22, 2024, 6:51am

try with these versions as python3.11 and tf 2.15.1 and let me know the results.

conda create -n tmpenv python=3.11
conda activate tmpenv
pip install tensorflow[and-cuda]==2.15.1

aprentic · May 22, 2024, 2:49pm

I’m an idiot and missed a step. Once I installed cuDNN SDK 8.6.0

AND ran:

CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib

I was able to run

python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

and get:

2024-05-22 10:43:36.388084: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-22 10:43:36.959889: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-05-22 10:43:37.602841: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-22 10:43:37.640749: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-22 10:43:37.646399: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

So it’s seeing the GPU. Yay! But when I try to run section 2.1 of introtodeeplearning/lab1/Part2_Music_Generation.ipynb at master · aamini/introtodeeplearning · GitHub
It exits with:

Note: you may need to restart the kernel to use updated packages.
2024-05-21 20:57:20.596732: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-21 20:57:21.239434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Note: you may need to restart the kernel to use updated packages.
2024-05-21 20:57:23.287808: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-21 20:57:23.329298: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[2], line 28
     25 # Check that we are using a GPU, if not switch runtimes
     26 #   using Runtime > Change Runtime Type > GPU
     27 print(tf.config.list_physical_devices())
---> 28 assert len(tf.config.list_physical_devices('GPU')) > 0
     29 assert COMET_API_KEY != "", "Please insert your Comet API Key"

AssertionError:

So now it looks like I am setting it up correctly in the virtual environment in the CLI but some crucial piece isn’t making it to the ~~python~~Jupyter notebook in VSC.

aprentic · May 22, 2024, 5:12pm

@Mohan_Krishna_G_R

I’ve tried your suggestion and I get the same result.
I can run the test in the CLI and it reports a GPU but, even when I chose that env and restart VSC, it only finds the CPU.

I added

print(tf.config.list_physical_devices())

to the code and it outputs:

[PhysicalDevice(name=‘/physical_device:CPU:0’, device_type=‘CPU’)]

When I open ‘Terminal->New Terminal’ in VSC I can try:

python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

and it does manage to find the GPU.

I’m starting to suspect this may not actually be a TF issue and may instead be a VSC issue. Please let me know if I should be reposting this question elsewhere.

Sotiris_Gkouzias · May 22, 2024, 6:51pm

Welcome @aprentic to the TensorFlow Community

You can try the following:

Create a fresh conda virtual environment and activate it,
pip install --upgrade pip,
pip install tensorflow[and-cuda],
Set environment variables:

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

Edit ./etc/conda/deactivate.d/env_vars.sh as follows:

#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR

Verify the GPU setup:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

I have submitted the pull request to update the official TensorFlow installation guide.

I hope it helps!

Mohan_Krishna_G_R · May 23, 2024, 2:34pm

You need not install cudnn separately, it’ll be installed along with

pip install tensorflow[and-cuda]==2.15.1

as you can check:

$ conda list cudnn

# Name                    Version                   Build  Channel
nvidia-cudnn-cu12         8.9.4.25                 pypi_0    pypi

with no more changes in the build, I could run,

in the same ipynb
it returns:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

for

print(tf.config.list_physical_devices('GPU'))

Also, the compatibility matrix is worst. Hence, any changes in the build aren’t working, versions will be conflicting, so try with these versions only. Let me know then, if it works…

conda create -n tmpenv python=3.11
conda activate tmpenv
pip install tensorflow[and-cuda]==2.15.1

create a new env… and then try for now…

aprentic · May 23, 2024, 5:12pm

Thank you both for your help @Mohan_Krishna_G_R and @sotiris.gkouzias !

I finally figured it out.

When I added the environment lines above:

CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib

to the end of my

.venv/bin/activate

it worked.
Adding them manually got it to work in the CLI and putting them in the .venv file got them added within VSC.

Topic		Replies	Views
Tensorflow on WSL2 cannot find my gpu, despite following the guide for tensorflow2.12 General Discussion install , gpu	3	4621	February 5, 2024
TensorFlow Install error General Discussion install , gpu , ubuntu	4	5028	January 31, 2024
GPU with cuda 11.8 not detected, could not find cuda drivers General Discussion install , gpu , ubuntu	5	21748	May 17, 2023
Tensor Flow not working in Jupyter General Discussion install , gpu	6	440	December 28, 2023
TensorFlow Installation Error General Discussion install , ubuntu	10	27945	March 11, 2024

GPU installation on Ubuntu

Related topics