How to get Nvidia A100 running with TF v2

Hello,
I have new server (Debian) with a NVIDIA A100, and cannot get tensorflow v2 (Keras) running.

python3 -m venv .venv
source .venv/bin/activate
pip install tensorflow-gpu
python run-tf.py

The same script for PyTorch works but PyTorch provides special pypi wheels that supports the Nvida A100 card. I guess something similar exists for TensorFlow but I don’t know where.

I have to know How do I run TensorFlow on Nvidia GPU? Why is TensorFlow not using GPU?

You don’t need -gpu. pip install tensorflow is both CPU and GPU (see Install TensorFlow 2). But I would use the docker image. Tensorflow versions 2.5 to 2.9 (current) use cuDNN v8.1 and CUDA v11.2 but that can change in the future. Being able to change TF versions without having to fidget with CUDA is great. And it makes the difference between remote and local development smaller so there’s less overhead when you switch between the two.

But what are the Python, cuDNN, and CUDA versions in your environment?

1 Like

Thanks Mog for the hint about using docker containers. Makes sense.

In case, if anyone don’t want to use containers, it is possible to use Nvidia stuff via Miniconda. Nvidia provides python packages to wrap their drivers but only on Conda (and not PyPi)

Install Minicoda in your user account

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# prevent conda autostart in shell
# conda config --set auto_activate_base false

Install a virtual env, pip in conda, tye packages

# install pip in conda globally to use conda drop-in replacement for pil
conda install -y pip

# install and activate a conda virtual env
conda create -y --name yourvenvname python=3.9 pip
conda activate yourvenvname

# install NVIDIA drivers
conda install -y cudatoolkit=11.3.1 cudnn=8.3.2 -c conda-forge
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/

# if you need pytorch too
# pip install torch==1.12.1+cu113 torchvision torchaudio -f https://download.pytorch.org/whl/torch_stable.html

# install tensorflow
pip install tensorflow

# install other packages
# pip install -e .
# pip install -r requirements.txt --no-cache-dir