I’m still trying to build tensorflow with GPU support from source, and not making any progress. I am running Mint 21 (same as ubuntu 22.04). Can anyone who has actually built it explain how to do it? The online docs are either incomplete or so full of “If’s” I don’t know what I am actually supposed to do. I am trying to install the build environment in a conda environment to avoid hosing something important.
I have installed the 535 drivers and nvidia-cuda-toolkit using apt. The following works:
(tftest)$ nvidia-smi
Sat Oct 21 21:33:30 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05 Driver Version: 535.86.05 CUDA Version: 12.2 |
(tftest) $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
I ran an nvidia docker image that runs a benchmark, to verify the GPU is working:
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Pascal” with compute capability 6.1
Compute 6.1 CUDA device: [NVIDIA GeForce GTX 1060 6GB]
10240 bodies, total time for 10 iterations: 7.831 ms
= 133.909 billion interactions per second
= 2678.175 single-precision GFLOP/s at 20 flops per interaction
I have a supported GPU and it works.
I pulled the tensorflow source from the git repository, and after much back and forth with nothing working successfully, I decided to go with the version 2.11 since I have been able to get that to build with CPU only support. I tried to install cuda and cudnn in my conda environment as follows:
conda install -c nvidia cuda-python=11.5
conda install -c nvidia cudnn=8.1
conda install -c nvidia cudatoolkit=11.5
However, ~/anaconda3/envs/tftest/include does NOT have cuda.h. What do I install to get it?
I do have a cuda.h (version 11.5) in /usr/include, I think it must have from from running apt install nvidia-cuda-toolkit. But I don’t want to hose my system by installing a bunch of incompatible or conflicting junk in /usr
When I run configure in the tensorflow source directory, it asks the following questions:
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 11]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 2]:
Please specify the locally installed NCCL version you want to use. [Leave empty to use GitHub - NVIDIA/nccl: Optimized primitives for collective multi-GPU communication]:
Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: /usr,/home/myname/anaconda3/envs/tftest
What is the correct answers to the questions? No matter what I enter, it will only respond that something is missing, inconsistent, or conflicts with something, then repeats the whole thing again. For example:
Inconsistent CUDA toolkit path: /usr vs /usr/lib
Asking for detailed CUDA configuration…