Tensorflow Installation issue, GPU not found, Ubuntu, NVidia RTX3050

Hi there!

Trying to install tensorflow (again, second try) - to use on my laptop with Nvidia GPU RTX 3050.
Ubuntu under WSL2.

followed instructions listed here: Instale o TensorFlow com pip

But on the step where I am checking the registered GPUs - I have bad outcome:

import tensorflow as tf

print(f"TensorFlow version: {tf.__version__}")

# List physical GPUs
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f"GPUs available: {gpus}")
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
else:
    print("No GPU available")

gives:

TensorFlow version: 2.16.1
No GPU available

(.venv) fire@note-4:~/py_projects/ATARI_YW_elim_p311$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-11 14:47:12.877129: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-11 14:47:12.918243: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-11 14:47:13.412965: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-07-11 14:47:14.017634: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-07-11 14:47:14.054469: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

so tensorflows can’t see any GPU.
It seems that something is wrong with environment variables… How can I check/set required variables? Note - I am not using conda, just manually creating a virtual environment …

What am I doing wrong, or how can I make GPU work to speed up calculations?

When running nvidia-smi I got:


(.venv) fire@note-4:~/py_projects/ATARI_YW_elim_p311$ nvidia-smi
Thu Jul 11 15:08:38 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.07             Driver Version: 537.34       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   59C    P8               4W /  40W |      0MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

But when running lspci:

(.venv) fire@note-4:~/py_projects/ATARI_YW_elim_p311$ lspci
05d5:00:00.0 System peripheral: Red Hat, Inc. Virtio file system (rev 01)
4715:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio 1.0 console (rev 01)
53c1:00:00.0 3D controller: Microsoft Corporation Basic Render Driver
e04a:00:00.0 3D controller: Microsoft Corporation Basic Render Driver

Please, kindly, guide me through the idea on how I can make tensorflow with GPU working.

Python 3.11
CUDA installed:
(.venv) fire@note-4:~$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
1 Like
  • Can you echo $LD_LIBRARY_PATH and echo $PATH ?
  • Tensorflow and python version ?
  • Are you running in wsl ?

Uninstall and reinstall.

@Simon_Au-Yong

That post explicitly states that nvidia-smi does not see the drivers.

nvidia-smi
yields
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


@twistfire

There is the possibility that somethin like this would work, but you need to understand the command and the problem before issuing it

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

Source.

here is the output.

(.venv) fire@note-4:~$ echo $LD_LIBRARY_PATH
/usr/local/cuda-12.5/lib64:/usr/local/cuda-12.5/lib64:
(.venv) fire@note-4:~$ echo $PATH
/home/fire/.vscode-server/extensions/ms-python.python-2024.8.1/python_files/deactivate/bash:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/bin:/usr/local/cuda-12.5/bin:/home/fire/miniconda3/bin:/home/fire/.vscode-server/extensions/ms-python.python-2024.8.1/python_files/deactivate/bash:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/bin:/home/fire/.vscode-server/bin/ea1445cc7016315d0f5728f8e8b12a45dc0a7286/bin/remote-cli:/home/fire/.local/bin:/usr/local/cuda-12.5/bin:/home/fire/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files/Microsoft/jdk-11.0.12.7-hotspot/bin:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0/:/mnt/c/Windows/System32/OpenSSH/:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/mnt/c/Program Files/Microsoft SQL Server/150/Tools/Binn/:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn/:/mnt/c/Program Files/dotnet/:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/110/Tools/Binn/:/mnt/c/Program Files (x86)/Windows Kits/8.1/Windows Performance Toolkit/:/mnt/c/Program Files/WireGuard/:/mnt/c/Program Files (x86)/MINI-REFPROP:/mnt/c/Program Files/Calibre2/:/mnt/c/Program Files/Git/cmd:/mnt/c/Program Files/MATLAB/R2022b/runtime/win64:/mnt/c/Program Files/MATLAB/R2022b/bin:/mnt/c/Users/fire/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/fire/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/fire/.dotnet/tools:/mnt/c/Users/fire/AppData/Local/Programs/MiKTeX/miktex/bin/x64/:/snap/bin

I tried to add this lines to my .venv/bin/activate script…

Now when echo $LD_LIBRARY_PATH I have:

fire@note-4:~/py_projects/ATARI_YW_elim_p311$ echo $LD_LIBRARY_PATH
/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/nvjitlink/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/nccl/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cusparse/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cusolver/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/curand/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cufft/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cudnn/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_runtime/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_nvrtc/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_nvcc/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_cupti/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cublas/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/nvjitlink/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/nccl/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cusparse/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cusolver/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/curand/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cufft/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cudnn/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_runtime/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_nvrtc/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_nvcc/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_cupti/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cublas/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/nvjitlink/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/nccl/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cusparse/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cusolver/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/curand/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cufft/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cudnn/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_runtime/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_nvrtc/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_nvcc/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cuda_cupti/lib:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cublas/lib:/usr/local/cuda-12.5/lib64:/usr/local/cuda-12.5/lib64::/lib/:/home/fire/py_projects/ATARI_YW_elim_p311/.venv/lib/python3.11/site-packages/nvidia/cudnn/lib

Sorry for the large listing.

But still can’t see GPU…

Also, I can’t understand why you are saying that nvidia-smi can’t see the GPU - it’s listed when running nvidia-smi, but when running python jupyter test script in this environment - it can’t see the GPU.

Also note, that both commands outputs different CUDA version… I can’t understand that.


fire@note-4:~/py_projects/ATARI_YW_elim_p311$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
fire@note-4:~/py_projects/ATARI_YW_elim_p311$ nvidia-smi
Thu Jul 11 19:55:51 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.07             Driver Version: 537.34       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P8               4W /  40W |      0MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Where should I dig to setup it properly?
Can anybody suggest a step-by-step instruction for fixing it? There seem to be tonnes of similar issues, and I can’t figure out if there is a good recipe that is stable and working.

I assume that the issue is probably with versions and paths.

Zhaopudark on GitHub has this 2.16.1 wsl2 helpful post.

Or use 2.15.

@twistfire

Just to be clear, I’m only another user, but here is my simple explanations and what you can easily try in 5’ to solve it.

Analysis

  • Driver is installed.
  • Python 3.11 is fine with tf 2.16
  • But CUDA should be 12.3, and cudnn 8.9

I’m following the tables here.

So, to be clear, CUDA 12.5 is not the correct version for tf 2.16 but let’s inspect further.

Those CUDA and CUDNN should also be installed by tensorflow[and-cuda], and that should override the system-wide installs.

Main Problem

Note that this command:

(.venv) fire@note-4:~$ echo $LD_LIBRARY_PATH
/usr/local/cuda-12.5/lib64:/usr/local/cuda-12.5/lib64:

does not include any .venv path.

Step towards solution

What is the output of

python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"

?? (with your venv activated.)

Then, you will need to add that to the paths likely.

Continuing the discussion from Tensorflow Installation issue, GPU not found, Ubuntu, NVidia RTX3050: