I’m trying to install TF on Ubuntu 24.02 using
nvida-smi
shows:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 Off | 00000000:01:00.0 On | N/A |
| 0% 51C P8 26W / 225W | 986MiB / 8192MiB | 39% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2716 G /usr/lib/xorg/Xorg 225MiB |
| 0 N/A N/A 3031 G /usr/bin/gnome-shell 72MiB |
| 0 N/A N/A 3964 G ...irefox/4259/usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 6726 G /usr/bin/gnome-text-editor 9MiB |
| 0 N/A N/A 9168 G ...erProcess --variations-seed-version 79MiB |
+-----------------------------------------------------------------------------------------+
pip install tensorflow[and-cuda] exits with:
Successfully installed tensorflow-2.16.1
But
python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”
returns:
2024-05-21 17:09:20.348427: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-21 17:09:20.965066: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-05-21 17:09:21.653899: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-21 17:09:21.702216: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
From all this I’m guessing that TF with GPU support is installed but I’m missing some step that lets Python know how to do it.
1 Like
try with these versions as python3.11 and tf 2.15.1 and let me know the results.
conda create -n tmpenv python=3.11
conda activate tmpenv
pip install tensorflow[and-cuda]==2.15.1
I’m an idiot and missed a step. Once I installed cuDNN SDK 8.6.0
AND ran:
CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
I was able to run
python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”
and get:
2024-05-22 10:43:36.388084: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-22 10:43:36.959889: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-05-22 10:43:37.602841: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-22 10:43:37.640749: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-22 10:43:37.646399: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
So it’s seeing the GPU. Yay! But when I try to run section 2.1 of introtodeeplearning/lab1/Part2_Music_Generation.ipynb at master · aamini/introtodeeplearning · GitHub
It exits with:
Note: you may need to restart the kernel to use updated packages.
2024-05-21 20:57:20.596732: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-21 20:57:21.239434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Note: you may need to restart the kernel to use updated packages.
2024-05-21 20:57:23.287808: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-21 20:57:23.329298: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[2], line 28
25 # Check that we are using a GPU, if not switch runtimes
26 # using Runtime > Change Runtime Type > GPU
27 print(tf.config.list_physical_devices())
---> 28 assert len(tf.config.list_physical_devices('GPU')) > 0
29 assert COMET_API_KEY != "", "Please insert your Comet API Key"
AssertionError:
So now it looks like I am setting it up correctly in the virtual environment in the CLI but some crucial piece isn’t making it to the pythonJupyter notebook in VSC.
@Mohan_Krishna_G_R
I’ve tried your suggestion and I get the same result.
I can run the test in the CLI and it reports a GPU but, even when I chose that env and restart VSC, it only finds the CPU.
I added
print(tf.config.list_physical_devices())
to the code and it outputs:
[PhysicalDevice(name=‘/physical_device:CPU:0’, device_type=‘CPU’)]
When I open ‘Terminal->New Terminal’ in VSC I can try:
python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”
and it does manage to find the GPU.
I’m starting to suspect this may not actually be a TF issue and may instead be a VSC issue. Please let me know if I should be reposting this question elsewhere.
Welcome @aprentic to the TensorFlow Community
You can try the following:
- Create a fresh conda virtual environment and activate it,
pip install --upgrade pip
,
pip install tensorflow[and-cuda]
,
- Set environment variables:
Locate the directory for the conda environment in your terminal window by running in the terminal:
echo $CONDA_PREFIX
Enter that directory and create these subdirectories and files:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Edit ./etc/conda/activate.d/env_vars.sh
as follows:
#!/bin/sh
# Store original LD_LIBRARY_PATH
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
# Get the CUDNN directory
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))
# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# Get the ptxas directory
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))
# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}
Edit ./etc/conda/deactivate.d/env_vars.sh
as follows:
#!/bin/sh
# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"
# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR
- Verify the GPU setup:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
I have submitted the pull request to update the official TensorFlow installation guide.
I hope it helps!
You need not install cudnn separately, it’ll be installed along with
pip install tensorflow[and-cuda]==2.15.1
as you can check:
$ conda list cudnn
# Name Version Build Channel
nvidia-cudnn-cu12 8.9.4.25 pypi_0 pypi
with no more changes in the build, I could run,
in the same ipynb
it returns:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
for
print(tf.config.list_physical_devices('GPU'))
Also, the compatibility matrix is worst. Hence, any changes in the build aren’t working, versions will be conflicting, so try with these versions only. Let me know then, if it works…
conda create -n tmpenv python=3.11
conda activate tmpenv
pip install tensorflow[and-cuda]==2.15.1
create a new env… and then try for now…
Thank you both for your help @Mohan_Krishna_G_R and @sotiris.gkouzias !
I finally figured it out.
When I added the environment lines above:
CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
to the end of my
.venv/bin/activate
it worked.
Adding them manually got it to work in the CLI and putting them in the .venv file got them added within VSC.
1 Like