NVIDIA driver: 545.29.06
OS: Zorin 17 (based on Ubuntu 22.04)
Python: 3.11.7 (via pyenv)
According to this table: https://www.tensorflow.org/install/source#gpu
TensorFlow 2.16.1 requires CUDA 12.3 and CuDNN 8.9 but can someone confirm this?
(The previous 2 time I installed CUDA ended up breaking my NVIDIA drivers)
Moreover, do I require Clang and Bazel as the table mentions?
2 Likes
Welcome @sagnik_t to the TensorFlow Community
You can try the following:
- Create a fresh conda virtual environment and activate it,
pip install --upgrade pip
,
pip install tensorflow[and-cuda]
,
- Set environment variables:
Locate the directory for the conda environment in your terminal window by running in the terminal:
echo $CONDA_PREFIX
Enter that directory and create these subdirectories and files:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Edit ./etc/conda/activate.d/env_vars.sh
as follows:
#!/bin/sh
# Store original LD_LIBRARY_PATH
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
# Get the CUDNN directory
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))
# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# Get the ptxas directory
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))
# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}
Edit ./etc/conda/deactivate.d/env_vars.sh
as follows:
#!/bin/sh
# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"
# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR
- Verify the GPU setup:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
I have submitted the pull request to update the official TensorFlow installation guide.
I hope it helps!
5 Likes
Thanks for the detailed post @sotiris.gkouzias on using tensorflow on conda with gpu.
I do have couple of questions and need your input
-
Does " pip install tensorflow[and-cuda]" take care of installing CUDA related packages or do I need to install CUDA driver/CUDA toolkit/cuDNN in my host machine [Ubuntu 22.04] before running pip command ?
-
Whenever I create a new environment in conda do I need to reinstall all these nvidia packages to use underlying gpu? Say I am using tensorflow 2.15 in env1 and created env2 for tensorflow 2.16, will all the related CUDA packages get installed in env2 ? As the dependencies between 2.15 and 2.16 are different.
Regards,
1 Like
@ACodingfreak welcome to the TensorFlow Forum!
Indeed when you run the command pip install tensorflow[and-cuda]
all necessary packages in order to utilize your GPU locally are installed as well. However, note that the compatible NVIDIA Driver should be pre-installed. That’s why you should first check by running the command nvidia-smi
and then proceed with the installation procedure.
If you wish to install TensorFlow version 2.15.1 in a different conda environment you could try running pip install tensorflow[and-cuda]==2.15.1
and again all necessary packages in order to utilize your GPU locally should be installed as well.
I hope it helps.
1 Like
Thanks for the detailed reply @sotiris.gkouzias
Well I have tried the exact instructions but looks like I am not 100% successful.
(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-06-11 18:51:41.128921: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-11 18:51:41.181513: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-11 18:51:41.181551: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-11 18:51:41.182877: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-11 18:51:41.190802: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-11 18:51:42.117957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-11 18:51:42.872312: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-11 18:51:42.920830: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-11 18:51:42.926008: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
$ conda list
# packages in environment at /home/codingfreak/anaconda3/envs/tf-gpu:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.1.0 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6
ca-certificates 2024.3.11 h06a4308_0
cachetools 5.3.3 pypi_0 pypi
certifi 2024.6.2 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
flatbuffers 24.3.25 pypi_0 pypi
gast 0.5.4 pypi_0 pypi
google-auth 2.30.0 pypi_0 pypi
google-auth-oauthlib 1.2.0 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.64.1 pypi_0 pypi
h5py 3.11.0 pypi_0 pypi
idna 3.7 pypi_0 pypi
keras 2.15.0 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libclang 18.1.1 pypi_0 pypi
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
markdown 3.6 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
ml-dtypes 0.3.2 pypi_0 pypi
ncurses 6.4 h6a678d5_0
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.2.5.6 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.2.142 pypi_0 pypi
nvidia-cuda-nvcc-cu12 12.2.140 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.2.140 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.2.140 pypi_0 pypi
nvidia-cudnn-cu12 8.9.4.25 pypi_0 pypi
nvidia-cufft-cu12 11.0.8.103 pypi_0 pypi
nvidia-curand-cu12 10.3.3.141 pypi_0 pypi
nvidia-cusolver-cu12 11.5.2.141 pypi_0 pypi
nvidia-cusparse-cu12 12.1.2.141 pypi_0 pypi
nvidia-nccl-cu12 2.16.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.2.140 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
openssl 3.0.13 h7f8727e_2
opt-einsum 3.3.0 pypi_0 pypi
packaging 24.1 pypi_0 pypi
pip 24.0 py311h06a4308_0
protobuf 4.25.3 pypi_0 pypi
pyasn1 0.6.0 pypi_0 pypi
pyasn1-modules 0.4.0 pypi_0 pypi
python 3.11.9 h955ad1f_0
readline 8.2 h5eee18b_0
requests 2.32.3 pypi_0 pypi
requests-oauthlib 2.0.0 pypi_0 pypi
rsa 4.9 pypi_0 pypi
setuptools 69.5.1 py311h06a4308_0
six 1.16.0 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0
tensorboard 2.15.2 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
tensorflow 2.15.1 pypi_0 pypi
tensorflow-estimator 2.15.0 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.37.0 pypi_0 pypi
termcolor 2.4.0 pypi_0 pypi
tk 8.6.14 h39e8969_0
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2024a h04d1e81_0
urllib3 2.2.1 pypi_0 pypi
werkzeug 3.0.3 pypi_0 pypi
wheel 0.43.0 py311h06a4308_0
wrapt 1.14.1 pypi_0 pypi
xz 5.4.6 h5eee18b_1
zlib 1.2.13 h5eee18b_1
(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ pip list
Package Version
---------------------------- ----------
absl-py 2.1.0
astunparse 1.6.3
cachetools 5.3.3
certifi 2024.6.2
charset-normalizer 3.3.2
flatbuffers 24.3.25
gast 0.5.4
google-auth 2.30.0
google-auth-oauthlib 1.2.0
google-pasta 0.2.0
grpcio 1.64.1
h5py 3.11.0
idna 3.7
keras 2.15.0
libclang 18.1.1
Markdown 3.6
MarkupSafe 2.1.5
ml-dtypes 0.3.2
numpy 1.26.4
nvidia-cublas-cu12 12.2.5.6
nvidia-cuda-cupti-cu12 12.2.142
nvidia-cuda-nvcc-cu12 12.2.140
nvidia-cuda-nvrtc-cu12 12.2.140
nvidia-cuda-runtime-cu12 12.2.140
nvidia-cudnn-cu12 8.9.4.25
nvidia-cufft-cu12 11.0.8.103
nvidia-curand-cu12 10.3.3.141
nvidia-cusolver-cu12 11.5.2.141
nvidia-cusparse-cu12 12.1.2.141
nvidia-nccl-cu12 2.16.5
nvidia-nvjitlink-cu12 12.2.140
oauthlib 3.2.2
opt-einsum 3.3.0
packaging 24.1
pip 24.0
protobuf 4.25.3
pyasn1 0.6.0
pyasn1_modules 0.4.0
requests 2.32.3
requests-oauthlib 2.0.0
rsa 4.9
setuptools 69.5.1
six 1.16.0
tensorboard 2.15.2
tensorboard-data-server 0.7.2
tensorflow 2.15.1
tensorflow-estimator 2.15.0
tensorflow-io-gcs-filesystem 0.37.0
termcolor 2.4.0
typing_extensions 4.12.2
urllib3 2.2.1
Werkzeug 3.0.3
wheel 0.43.0
wrapt 1.14.1
(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$
(tf-gpu) codingfreak@HP-ZBook-PC:~/anaconda3/envs/tf-gpu$ nvidia-smi
Tue Jun 11 18:58:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 2000 Ada Gene... Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P3 10W / 45W | 15MiB / 8188MiB | 7% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2495 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
@ACodingfreak no worries as it is a harmless warning. The output seems perfectly normal (for more info you can read the relevant discussion here).
NUMA is a memory architecture used in multiprocessor systems where the memory access time depends on the memory location relative to the processor.
NUMA support is important for optimizing memory access on systems with multiple CPUs or GPUs. It allows the operating system to allocate memory and schedule processes in a way that reduces memory access latency.
In order to validate that your GPU is utilized as appropriate try training on your PC a relatively simple deep learning model with a ready-to-use tensorflow dataset for 5 epochs and time it. Run the exact experiment in Google Colab, enable GPU acceleration and time it then compare the results.
1 Like
@sotiris.gkouzias
Thanks for your previous reply and sorry for my late comeback
Let me add some details regarding my setup: HP ZBOOK laptop with RTX 2000 GPU and Ubuntu 22.04.
As mentioned in previous comments, I installed nvidia drivers of version 555.42 and somehow after a week it simply vanished. Not sure how did this happen. So I ended up installing the Ubuntu recommended drivers which is 535 with cuda 12.2 and should work for tensorflow 2.15.1.
$ nvidia-smi
Mon Jun 24 10:54:56 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX 2000 Ada Gene... Off | 00000000:01:00.0 Off | N/A |
| N/A 40C P8 3W / 45W | 7825MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2051 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 50888 C ...ak/anaconda3/envs/tf-gpu/bin/python 6414MiB |
| 0 N/A N/A 55908 C ...ak/anaconda3/envs/tf-gpu/bin/python 1302MiB |
| 0 N/A N/A 59240 C ...ak/anaconda3/envs/tf-gpu/bin/python 92MiB |
I have tried GAN example shared in below link to check the performance.
Local-GPU (2.15.1) - 4.5 seconds per epoch
Colab-GPU (2.15) - 10 seconds per epoch
Local-CPU (2.16.1) - 126 seconds per epoch
1 Like
Impressive results @ACodingfreak! It seems that you successfully utilized your GPU with TensorFlow 2.15.1
. Note that if you want to explore and use Keras 3 and its awesome capabilities to perform your deep learning experiments it is best to install TensorFlow 2.16.1
.
@sotiris.gkouzias - Thanks for your help
Well I will try TF 2.16 hopefully by end of June.
I do have couple of questions which I want to understand
- While trying to understand the environment variables I came across ptxas directory which seems to contain nvcc, nccl, runtime and so on.
:~/anaconda3/envs/tf-gpu/lib/python3.11/site-packages/nvidia$ ls
cublas cuda_cupti cuda_nvcc cuda_nvrtc cuda_runtime cudnn cufft curand cusolver cusparse __init__.py nccl nvjitlink __pycache__
So technically I dont need any explicit installation of CUDA toolkit if using tensorflow-gpu?
- I am not able to access nvcc from the terminal. Is this something only available in anaconda environment?
As always thanks for the detailed reply @sotiris.gkouzias
1 Like
Hi there @ACodingfreak , it seems like you have successfully run tensorflow 2.15.1 in local GPU, i thought the latest version of tensorflow that you can run with gpu is only 2.10.0? Im seeking for advices, it seems like i have configured too much in my anaconda environment, now its kinda messed up and I have no idea how to solve it.
@Yummy_Gang
Just follow the steps shared by @Sotiris_Gkouzias in this post