Error: Could not find cuda drivers on your machine, GPU will not be used

Hello,
I have installed CUDA (12.2) and cuDNN (8.9) in local Ubuntu 22.04.
The machine is having NVIDIA RTX A4000 graphics card. After installation, when I tried to get tf.config.list_physical_devices(‘GPU’), I’m getting the following error. Can you please help me to address this issue?

Error Log:

>>> import tensorflow as tf
2024-07-08 10:24:38.718771: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-08 10:24:38.752647: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-08 10:24:38.752683: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-08 10:24:38.753851: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-08 10:24:38.759529: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-08 10:24:38.759678: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-08 10:24:39.292109: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>>
>>>
>>> print("Num GPUs Available:", len(tf.config.experimental.list_physical_devices('GPU')))
2024-07-08 10:25:15.916315: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:274] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-07-08 10:25:15.916354: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:129] retrieving CUDA diagnostic information for host: eeegssc
2024-07-08 10:25:15.916364: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:136] hostname: eeegssc
2024-07-08 10:25:15.916427: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:159] libcuda reported version is: 535.183.1
2024-07-08 10:25:15.916454: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:163] kernel reported version is: 535.183.1
2024-07-08 10:25:15.916461: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:241] kernel version seems to match DSO: 535.183.1
Num GPUs Available: 0

>>> print("Num CPUs Available: ", len(tf.config.list_physical_devices('CPU')))
Num CPUs Available:  1
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Num GPUs Available:  0
>>>

(venv) (base) gssc@eeegssc:~/bala/cudnn$ nvidia-smi
Mon Jul  8 10:47:03 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4000               Off | 00000000:21:00.0 Off |                  Off |
| 41%   34C    P8               8W / 140W |      1MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A4000               Off | 00000000:22:00.0 Off |                  Off |
| 41%   33C    P8               9W / 140W |      1MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A4000               Off | 00000000:41:00.0 Off |                  Off |
| 41%   32C    P8               5W / 140W |      1MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A4000               Off | 00000000:43:00.0 Off |                  Off |
| 41%   34C    P8               7W / 140W |      1MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
(venv) (base) gssc@eeegssc:~/bala/cudnn$


(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2$ ls
bin                extras  gds-12.2  lib64    nsight-compute-2023.2.2  nsight-systems-2023.2.3  nvvm   src      version.json
compute-sanitizer  gds     include   libnvvp  nsightee_plugins         nvml                     share  targets
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2$
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2$


(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2/include$ ls *cudnn*
cudnn_adv_infer.h     cudnn_adv_train_v8.h  cudnn_cnn_infer.h     cudnn_cnn_train_v8.h  cudnn_ops_infer_v8.h  cudnn_v8.h
cudnn_adv_infer_v8.h  cudnn_backend.h       cudnn_cnn_infer_v8.h  cudnn.h               cudnn_ops_train.h     cudnn_version.h
cudnn_adv_train.h     cudnn_backend_v8.h    cudnn_cnn_train.h     cudnn_ops_infer.h     cudnn_ops_train_v8.h  cudnn_version_v8.h
(venv) (base) gssc@eeegssc:/usr/local/cuda-12.2/include$

PATH and LD_LIBRARY_PATH env variables are properly set.

1 Like

a few Qs:

  • Which version of TF and Python are you running in that venv ?
  • How did you install TensorFlow ? (Exactly which command)
  • Is that standalone Ubuntu or WSL/Ubuntu?