Could please let us know in which environment you are facing an issue. Also i can see that you are using CUDA 12.3. could you please try by using CUDA 12.2. Thank You.
RHEL 9, Python 3.9, cudnn 12.9, and just upgraded CUDA to 12.4. Oddly, I have 2 GPU nodes that work just fine. The fix was to set CUDNN_PATH and include CUDNN_PATH/lib to $LD_LIBRARY_PATH. This one node has the same packages, but clearly there is some difference that I can’t find:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-02-26 14:35:16.580905: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-26 14:35:16.581143: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-26 14:35:16.584441: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-26 14:35:16.625948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-26 14:35:17.760304: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
Notice that the log:
Could not find cuda drivers on your machine, GPU will not be used.
is repeated twice. That indicates 2 devices, and indeed, there are2 GPU’s, there has to be an env variable I am missing.
Hi @RtheK, As per test build configuration Tensorflow 2.16.1 supports CuDnn 8.9 and CUDA 12.3 but you are trying with CUDA 12.4 which is incompatible with 2.16. Could you please try with CUDA 12.3. Thank You.