but when I’m trying to run my tensorflow code I’m getting:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-10 19:48:31.318358: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib:/opt/amazon/efa/lib:/usr/local/mpi/lib:/opt/amazon/openmpi/lib:/usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:
2021-09-10 19:48:31.319345: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
and when I check GPUs with print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU'))) it’s:
Num GPUs Available: 0
Can you please advise on how to downgrade CUDA from 11.4 to 11.2?
I was trying to reinstall everything, but wasnt able to resolve this when installing CUDA 11.2 again:
The following packages have unmet dependencies:
cuda : Depends: cuda-11-4 (>= 11.4.2) but it is not going to be installed
cuda-runtime-11-2 : Depends: cuda-drivers (>= 460.32.03) but it is not going to be installed
linux-image-4.15.0-1102-azure : Conflicts: linux-image-unsigned-4.15.0-1102-azure but 4.15.0-1102.113 is to be installed
linux-image-4.15.0-1102-oem : Conflicts: linux-image-unsigned-4.15.0-1102-oem but 4.15.0-1102.113 is to be installed
linux-image-4.15.0-1112-azure : Conflicts: linux-image-unsigned-4.15.0-1112-azure but 4.15.0-1112.125 is to be installed
linux-image-4.15.0-1122-azure : Conflicts: linux-image-unsigned-4.15.0-1122-azure but 4.15.0-1122.135 is to be installed
linux-image-unsigned-4.15.0-1102-azure : Conflicts: linux-image-4.15.0-1102-azure but 4.15.0-1102.113 is to be installed
linux-image-unsigned-4.15.0-1102-oem : Conflicts: linux-image-4.15.0-1102-oem but 4.15.0-1102.113 is to be installed
linux-image-unsigned-4.15.0-1112-azure : Conflicts: linux-image-4.15.0-1112-azure but 4.15.0-1112.125 is to be installed
linux-image-unsigned-4.15.0-1122-azure : Conflicts: linux-image-4.15.0-1122-azure but 4.15.0-1122.135 is to be installed
linux-modules-nvidia-390-4.15.0-1102-aws : Depends: nvidia-kernel-common-390 (<= 390.143-1) but 390.144-0ubuntu0.18.04.1 is to be installed
linux-modules-nvidia-390-4.15.0-1112-azure : Depends: nvidia-kernel-common-390 (<= 390.141-1) but 390.144-0ubuntu0.18.04.1 is to be installed
linux-modules-nvidia-450-4.15.0-1102-aws : Depends: nvidia-kernel-common-450 (<= 450.119.03-1) but it is not going to be installed
Depends: nvidia-kernel-common-450 (>= 450.119.03) but it is not going to be installed
linux-modules-nvidia-450-4.15.0-1112-azure : Depends: nvidia-kernel-common-450 (<= 450.102.04-1) but it is not going to be installed
Depends: nvidia-kernel-common-450 (>= 450.102.04) but it is not going to be installed
linux-modules-nvidia-460-4.15.0-1102-aws : Depends: nvidia-kernel-common-460 (<= 460.73.01-1) but it is not going to be installed
Depends: nvidia-kernel-common-460 (>= 460.73.01) but it is not going to be installed
linux-modules-nvidia-460-4.15.0-1112-azure : Depends: nvidia-kernel-common-460 (<= 460.56-1) but it is not going to be installed
Depends: nvidia-kernel-common-460 (>= 460.56) but it is not going to be installed
linux-modules-nvidia-460-4.15.0-1122-azure : Depends: nvidia-kernel-common-460 (<= 460.91.03-1) but it is not going to be installed
Depends: nvidia-kernel-common-460 (>= 460.91.03) but it is not going to be installed
linux-modules-nvidia-470-4.15.0-1122-azure : Depends: nvidia-kernel-common-470 (<= 470.57.02-1) but it is not going to be installed
Depends: nvidia-kernel-common-470 (>= 470.57.02) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
I was trying to run docker, but it only works without gpus flag, because when I’m trying: docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash
it’s throws this error: docker: Error response from daemon: exec: "nvidia-container-runtime-hook": executable file not found in $PATH. ERRO[0000] error waiting for container: context canceled
how can I fix that?
when I’m running without gpus: docker run -it tensorflow/tensorflow:latest-gpu bash
So it’s said I have to install nvidia-container-toolkit:
On versions including and after 19.03, you will use the nvidia-container-toolkit package and the --gpus all flag
my docker -v: Docker version 19.03.11
I was trying to install sudo apt-get install -y nvidia-container-runtime as said in the guide but this occured:
cuda-drivers is already the newest version (470.57.02-1).
cuda-drivers set to manually installed.
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
cuda-drivers-470 : Depends: libnvidia-gl-470 (>= 470.57.02) but it is not going to be installed
libnvidia-ifr1-470 : Depends: libnvidia-gl-470 but it is not going to be installed
nvidia-driver-470 : Depends: libnvidia-gl-470 (= 470.57.02-0ubuntu1) but it is not going to be installed
Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
Recommends: libnvidia-compute-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
Recommends: libnvidia-decode-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
Recommends: libnvidia-encode-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
Recommends: libnvidia-ifr1-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
Recommends: libnvidia-fbc1-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
Recommends: libnvidia-gl-470:i386 (= 470.57.02-0ubuntu1) but it is not installable
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
I was trying different ways to install and reinstall nvidia drivers and cuda but I guess I did some mistake, but this what I see when I’m trying installing it with other installer:
So I’m not sure how to fix “unmet dependencies” issue, because uninstalling and reinstalling again doesn’t solve it…
and still have:
docker: Error response from daemon: exec: "nvidia-container-runtime-hook": executable file not found in $PATH.
ERRO[0000] error waiting for container: context canceled
when running docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash