Hello,
I have installed CUDA (12.2) and cuDNN (8.9) in local Ubuntu 22.04.
The machine is having NVIDIA RTX A4000 graphics card. After installation, when I tried to get tf.config.list_physical_devices(‘GPU’), I’m getting the following error. Can you please help me to address this issue?
Error Log:
>>> import tensorflow as tf
2024-07-08 10:24:38.718771: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-08 10:24:38.752647: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-08 10:24:38.752683: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-08 10:24:38.753851: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-08 10:24:38.759529: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-08 10:24:38.759678: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-08 10:24:39.292109: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>>
>>>
>>> print("Num GPUs Available:", len(tf.config.experimental.list_physical_devices('GPU')))
2024-07-08 10:25:15.916315: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:274] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-07-08 10:25:15.916354: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:129] retrieving CUDA diagnostic information for host: eeegssc
2024-07-08 10:25:15.916364: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:136] hostname: eeegssc
2024-07-08 10:25:15.916427: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:159] libcuda reported version is: 535.183.1
2024-07-08 10:25:15.916454: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:163] kernel reported version is: 535.183.1
2024-07-08 10:25:15.916461: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:241] kernel version seems to match DSO: 535.183.1
Num GPUs Available: 0
>>> print("Num CPUs Available: ", len(tf.config.list_physical_devices('CPU')))
Num CPUs Available: 1
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Num GPUs Available: 0
>>>
(venv) (base) gssc@eeegssc:~/bala/cudnn$ nvidia-smi
Mon Jul 8 10:47:03 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A4000 Off | 00000000:21:00.0 Off | Off |
| 41% 34C P8 8W / 140W | 1MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A4000 Off | 00000000:22:00.0 Off | Off |
| 41% 33C P8 9W / 140W | 1MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A4000 Off | 00000000:41:00.0 Off | Off |
| 41% 32C P8 5W / 140W | 1MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A4000 Off | 00000000:43:00.0 Off | Off |
| 41% 34C P8 7W / 140W | 1MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
(venv) (base) gssc@eeegssc:~/bala/cudnn$
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2$ ls
bin extras gds-12.2 lib64 nsight-compute-2023.2.2 nsight-systems-2023.2.3 nvvm src version.json
compute-sanitizer gds include libnvvp nsightee_plugins nvml share targets
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2$
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2$
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2/include$ ls *cudnn*
cudnn_adv_infer.h cudnn_adv_train_v8.h cudnn_cnn_infer.h cudnn_cnn_train_v8.h cudnn_ops_infer_v8.h cudnn_v8.h
cudnn_adv_infer_v8.h cudnn_backend.h cudnn_cnn_infer_v8.h cudnn.h cudnn_ops_train.h cudnn_version.h
cudnn_adv_train.h cudnn_backend_v8.h cudnn_cnn_train.h cudnn_ops_infer.h cudnn_ops_train_v8.h cudnn_version_v8.h
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2/include$
PATH and LD_LIBRARY_PATH env variables are properly set.