CUDA 12.3 and Python 3.9, with any TF, GPU not found

RtheK · February 21, 2024, 4:01am

Is there a way to get the GPU’s to work? Python 3.11 works flawlessly. This is on RHEL 9.


[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import os

>>>

>>> # Set TensorFlow log level to suppress warnings and info messages

>>> os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

>>>

>>> # Now you can import and use TensorFlow

>>> import tensorflow as tf

>>> import tensorflow as tf

>>> print(tf.sysconfig.get_build_info())

OrderedDict([('cpu_compiler', '/usr/lib/llvm-17/bin/clang'), ('cuda_compute_capabilities', ['sm_50', 'sm_60', 'sm_70', 'sm_80', 'compute_90']), ('cuda_version', '12.3'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

>>>

>>> physical_devices = tf.config.list_physical_devices('GPU')

>>> print(physical_devices)

[]

>>>

>>> physical_devices = tf.config.list_physical_devices('GPU')

>>> print(physical_devices)

[]

>>>

pip show tf-nightly

Name: tf-nightly

Version: 2.16.0

Summary: TensorFlow is an open source machine learning framework for everyone.

Home-page: https://www.tensorflow.org/

Author: Google Inc.

Author-email: packages@tensorflow.org

License: Apache 2.0

Location: /usr/local/lib64/python3.9/site-packages

Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras-nightly, libclang, ml-dtypes, numpy, opt-einsum, packaging, protobuf, requests, setuptools, six, tb-nightly, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt

Required-by:

Kiran_Sai_Ramineni · February 26, 2024, 6:50am

Hi @RtheK, I was able to detect gpu with tensorflow 2.16-nightly.

Could please let us know in which environment you are facing an issue. Also i can see that you are using CUDA 12.3. could you please try by using CUDA 12.2. Thank You.

RtheK · February 26, 2024, 7:45pm

RHEL 9, Python 3.9, cudnn 12.9, and just upgraded CUDA to 12.4. Oddly, I have 2 GPU nodes that work just fine. The fix was to set CUDNN_PATH and include CUDNN_PATH/lib to $LD_LIBRARY_PATH. This one node has the same packages, but clearly there is some difference that I can’t find:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2024-02-26 14:35:16.580905: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

2024-02-26 14:35:16.581143: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.

2024-02-26 14:35:16.584441: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.

2024-02-26 14:35:16.625948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.

To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2024-02-26 14:35:17.760304: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

[]

Notice that the log:

Could not find cuda drivers on your machine, GPU will not be used.

is repeated twice. That indicates 2 devices, and indeed, there are2 GPU’s, there has to be an env variable I am missing.

Kiran_Sai_Ramineni · April 15, 2024, 8:32am

Hi @RtheK, As per test build configuration Tensorflow 2.16.1 supports CuDnn 8.9 and CUDA 12.3 but you are trying with CUDA 12.4 which is incompatible with 2.16. Could you please try with CUDA 12.3. Thank You.

Sotiris_Gkouzias · April 16, 2024, 2:26pm

There is an open issue regarding GPU utilization here: TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) · Issue #63362 · tensorflow/tensorflow · GitHub

The respective pull request (pending review) with revised instructions to pip install tensorflow[and-cuda] for linux users with CUDA-enabled GPUs: docs/site/en/install/pip.md at patch-1 · sgkouzias/docs · GitHub I hope it helps!

Topic		Replies	Views
Tensorflow pip installation never finds GPUs General Discussion pip , gpu	6	699	April 16, 2024
Nvidia drivers are installed but getting error upon tensorflow import General Discussion nvidia , tensorflow	3	320	June 16, 2024
GPU with cuda 11.8 not detected, could not find cuda drivers General Discussion install , gpu , ubuntu	5	21779	May 17, 2023
GPU not found for tensorflow 2.16.1 General Discussion cuda , nvidia , gpu-comptibilty , 216	2	611	March 26, 2024
"Could not find cuda drivers on your machine, GPU will not be used" with tensorflow General Discussion gpu	2	1786	October 27, 2023

CUDA 12.3 and Python 3.9, with any TF, GPU not found

Related topics