I’ve been struggling to the problem written below for many days and would like you to help me.
What I want to do is to use tensorflow with GPU on Docker on Ubuntu.
My GPU is GeForce GTX 1070, and my OS is Ubuntu 22.04.3 LTS
I’ve installed Docker
$ docker --version
Docker version 26.1.1, build 4cf5afa
Before I started the following, I removed every nvidia or cuda module.
$ sudo apt-get -y --purge remove nvidia*
$ sudo apt-get -y --purge remove cuda*
$ sudo apt-get -y --purge remove cudnn*
$ sudo apt-get -y --purge remove libnvidia*
$ sudo apt-get -y --purge remove libcuda*
$ sudo apt-get -y --purge remove libcudnn*
$ sudo apt-get autoremove
$ sudo apt-get autoclean
$ sudo apt-get update
$ sudo rm -rf /usr/local/cuda*
$ pip uninstall tensorflow-gpu
Afterward, I installed Nvidia driver
$ sudo apt install nvidia-driver-535
And nvidia-smi works fine.
I got from nvidia-smi
NVIDIA-SMI 545.29.06
Driver Version 545.29.06
CUDA Version 12.3
I followed the instruction GitHub - NVIDIA/nvidia-docker: Build and run Docker containers leveraging NVIDIA GPUs (deprecated) and
Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.15.0 documentation
to install NVIDIA Container Toolkit.
Then I ran a container
$ docker container run --rm -it --name tf --mount type=bind,source=/home/(myname))/docker/tensorflow,target=/bindcont tensorflow/tensorflow:2.15.0rc1-gpu bash
When I ran sample.py (shown below), I got
# python sample.py
2024-05-02 13:46:01.669548: I
external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda
drivers on your machine, GPU will not be used. 2024-05-02
13:46:01.689375: E
external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable
to register cuDNN factory: Attempting to register factory for plugin
cuDNN when one has already been registered 2024-05-02
13:46:01.689395: E
external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to
register cuFFT factory: Attempting to register factory for plugin
cuFFT when one has already been registered 2024-05-02
13:46:01.690008: E
external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable
to register cuBLAS factory: Attempting to register factory for plugin
cuBLAS when one has already been registered 2024-05-02
13:46:01.693281: I external/local_tsl/tsl/cuda/cudart_stub.cc:31]
Could not find cuda drivers on your machine, GPU will not be used.
2024-05-02 13:46:01.693384: I
tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow
binary is optimized to use available CPU instructions in
performance-critical operations. To enable the following
instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild
TensorFlow with the appropriate compiler flags. 2024-05-02
13:46:02.374705: E
external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:274] failed
call to cuInit: UNKNOWN ERROR (34)tf.Tensor( [[1.] [1.]], shape=(2, 1), dtype=float32)
Here, sample.py is like below
# cat sample.py
import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
import tensorflow as tf
x = tf.ones(shape=(2, 1))
print(x)
It seems like my GPU is not recognized and tensorflow-with-GPU doesn’t work properly.
Accorging to Docker | TensorFlow, I don’t have to install the NVIDIA® CUDA® Toolkit but the error message says “Could not find cuda drivers on your machine”, which doesn’t make sense to me.
Could anyone help me and suggest what I should do?