Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice

Hi, I have a related question to the topic in

https://github.com/tensorflow/tensorflow/issues/56927

but am using Ubuntu 22.04

After a fresh installation (I was using 18.04 until a cataclysmic event), I seem to be having a similar issue
I am trying to use deepxde with tensorflow backend.

(a lot of what follows is unnecessary background)
I am trying to compile one of their examples and my attention is drawn to the portion where “nvvm” is mentioned.

chaztikov@priority:~/git/deepxde/examples/pinn_forward$ python3 Burgers_RAR.py
Using backend: tensorflow

Enable just-in-time compilation with XLA.

2022-09-10 23:10:14.449892: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-10 23:10:15.157727: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-09-10 23:10:15.157784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10113 MB memory: → device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
Compiling model…
‘compile’ took 0.000428 s

Training model…

WARNING:tensorflow:AutoGraph could not transform <function at 0x7f736f697250> and will run it as-is.
Cause: could not parse the source code of <function at 0x7f736f697250>: no matching AST found among candidates:

coding=utf-8

lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function at 0x7f736f697490> and will run it as-is.
Cause: could not parse the source code of <function at 0x7f736f697490>: no matching AST found among candidates:

coding=utf-8

lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2022-09-10 23:10:16.768843: I tensorflow/compiler/xla/service/service.cc:170] XLA service 0x55a2ebd8d3a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-09-10 23:10:16.768872: I tensorflow/compiler/xla/service/service.cc:178] StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2022-09-10 23:10:16.797603: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:263] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable.
2022-09-10 23:10:17.433963: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-10 23:10:17.434943: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-10 23:10:17.434971: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn’t get ptxas version string: INTERNAL: Couldn’t invoke ptxas --version
2022-09-10 23:10:17.435713: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-10 23:10:17.435784: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify PATHtocustomizeptxaslocation.Thismessagewillbeonlyloggedonce.2022−09−1023:10:17.440252:Itensorflow/core/platform/default/subprocess.cc:304]Startcannotspawnchildprocess:Nosuchfileordirectory2022−09−1023:10:17.440280:Wtensorflow/streamexecutor/gpu/asmcompiler.cc:80]Couldn′tgetptxasversionstring:INTERNAL:Couldn′tinvokeptxas−−version2022−09−1023:10:17.441066:Itensorflow/core/platform/default/subprocess.cc:304]Startcannotspawnchildprocess:Nosuchfileordirectory2022−09−1023:10:17.441134:Wtensorflow/compiler/xla/service/gpu/buffercomparator.cc:640]INTERNAL:FailedtolaunchptxasRelyingondrivertoperformptxcompilation.SettingXLAFLAGS=−−xlagpucudadatadir=/path/to/cudaormodifyingPATHtocustomizeptxaslocation.Thismessagewillbeonlyloggedonce.2022−09−1023:10:17.440252:Itensorflow/core/platform/default/subprocess.cc:304]Startcannotspawnchildprocess:Nosuchfileordirectory2022−09−1023:10:17.440280:Wtensorflow/streamexecutor/gpu/asmcompiler.cc:80]Couldn′tgetptxasversionstring:INTERNAL:Couldn′tinvokeptxas−−version2022−09−1023:10:17.441066:Itensorflow/core/platform/default/subprocess.cc:304]Startcannotspawnchildprocess:Nosuchfileordirectory2022−09−1023:10:17.441134:Wtensorflow/compiler/xla/service/gpu/buffercomparator.cc:640]INTERNAL:FailedtolaunchptxasRelyingondrivertoperformptxcompilation.SettingXLAFLAGS=−−xlagpucudadatadir=/path/to/cudaormodifyingPATH can be used to set the location of ptxas
This message will only be logged once.
2022-09-10 23:10:17.534116: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can’t find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
/usr/local/cuda-11.2
/usr/local/cuda
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule’s DebugOptions. For most apps, setting the environment variable XLA_FLAGS=–xla_gpu_cuda_data_dir=/path/to/cuda will work.
2022-09-10 23:10:17.721031: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-10 23:10:17.721072: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn’t get ptxas version string: INTERNAL: Couldn’t invoke ptxas --version
2022-09-10 23:10:17.721699: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-10 23:10:17.722104: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:456] ptxas returned an error during compilation of ptx to sass: ‘INTERNAL: Failed to launch ptxas’ If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted (core dumped)
chaztikov@priority:/git/deepxde/examples/pinn_forwardCO
COLORTERMCOLORTERMCOLUMNS $COMP_WORDBREAKS
chaztikov@priority:
/git/deepxde/examples/pinn_forwardCO
COLORTERMCOLORTERMCOLUMNS COMPWORDBREAKSchaztikov@priority: /git/deepxde/examples/pinnforwardCOMPWORDBREAKSchaztikov@priority: /git/deepxde/examples/pinnforward COCOCOLORTERM COLUMNSCOLUMNSCOMP_WORDBREAKS

I did not “conda init bashrc” because I found that doing so interfered with an otherwise successful installation of nvidia cuda cudatoolkit etc.
(though I am afraid to break anything, I welcome suggestions on this point and on all points)

I see that I do have nvvm as indicated below

chaztikov@priority:/git/deepxde/examples/pinn_forward$ locate /nvvm/libdevice
/home/chaztikov/anaconda3/nvvm/libdevice
/home/chaztikov/anaconda3/nvvm/libdevice/libdevice.10.bc
/home/chaztikov/anaconda3/pkgs/cuda-nvcc-11.7.99-0/nvvm/libdevice
/home/chaztikov/anaconda3/pkgs/cuda-nvcc-11.7.99-0/nvvm/libdevice/libdevice.10.bc
chaztikov@priority:
/git/deepxde/examples/pinn_forward$

so I will set in ~/.bashrc
export CUDA_DIR=“/home/chaztikov/anaconda3/pkgs/cuda-nvcc-11.7.99-0/”

I tried the above, and it didn’t work, still getting the same error message,

Note: before and after that change (export CUDA_DIR etc) to my ~/.bashrc, I still seem to have tensorflow locating the GPU.

chaztikov@priority:~/git/deepxde/examples/pinn_forward$ python3
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import tensorflow
tensorflow.device(‘GPU’)
2022-09-11 21:07:57.423925: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-11 21:07:57.966877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9858 MB memory: → device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
<tensorflow.python.eager.context._EagerDeviceContext object at 0x7fbd3557d9c0>

torch also seems to work as a side note

chaztikov@priority:~/git/deepxde/examples/pinn_forward$ python3
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch;torch.cuda.is_available()
True

EDIT: I re-read the output when importing tensorflow, is this indicating that the GPU is found but somehow not using the GPU? It doesn’t seem that way, it seems to be indicating that tensorflow was compiled on my system without the proper optimization flags.

1 Like

Please help, I’d really appreciate it, as this is holding up progress on a time-sensitive project :grimacing: Thanks!

Could you share the output of the snippet ?

import tensorflow as tf
tf.config.list_physical_devices('GPU')

For what it’s worth, I solved the same problem by first creating the nvvm/libdevice folder in my Conda environment lib folder. Thereafter copying the "libdevice.10.bc` file to that directory.

Next, I set

export XLA_FLAGS=–xla_gpu_cuda_data_dir=/home//miniconda3/envs//lib

in Bash, within my activated Conda environment.

Based on the error message snip Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice., it is not sufficient to make the XLA_FLAGS point to the file. The folder structure must match.

This procedure is more or less described in numerous other similar issues.

4 Likes

After fixing my libdevice issue, I ran into the Couldn't invoke ptxas --version issue. Still using Conda, as recommended by the official docs. I solved that by installing

conda install -c nvidia cuda-nvcc

in my activated Conda/ Tensorflow environment. I got that recipe from this Github issue:

3 Likes

Hi guys, i’ve done two fresh installs (Install TensorFlow with pip) on two laptops RTX (ubuntu/anaconda/jupyter) and GTX (ubuntu/miniconda/jupyter) .

Both have the same issue. nvidia-smi works but shows CUDA 12.0.

import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’)) → works.

At model.fit(x_train, y_train, epochs=5) i get the same issue →

tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can’t find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
/usr/local/cuda-11.2
/usr/local/cuda
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule’s DebugOptions. For most apps, setting the environment variable XLA_FLAGS=–xla_gpu_cuda_data_dir=/path/to/cuda will work.

I’v run both

sudo cp -r /home/steve/anaconda3.2202.10/pkgs/cudatoolkit-11.2.2-hbe64b41_10/lib/ /usr/local/cuda and export XLA_FLAGS=–xla_gpu_cuda_data_dir=/anaconda3.2202.10/pkgs/cudatoolkit-11.2.2-hbe64b41_10/lib/libdevice.10.bc

and still getting the same error message.

tensorflow 2.10 native works great still.

I really would like to progress on to the linux platform to continue to use tf latest features any advise?

ok guys for wsl2 tf==2.10 on works.

using code

import tensorflow as tf
tf.config.list_physical_devices(‘GPU’)
sys_details = tf.sysconfig.get_build_info()
cuda = sys_details[“cuda_version”]
cudnn = sys_details[“cudnn_version”]
print(cuda, cudnn)

confirms “11.2 8”

where as for tf==2.11

it gives “64_112 64_8”

lets hope this is not a monopolistic attack by google against microsoft

1 Like

ok found this solution for tf==2.11

optimizer=tf.keras.optimizers.legacy.Adam()

NOT optimizer=“adam”

so as self taught coder that is interesting to see what part of tensorflow calls the gpu

3 Likes

@Steven_Cohen this fix worked! I was trying to reproduce the official DDPM example on the Keras website on TF 2.11/Python 3.10 using Miniconda3. Do you have any insights as to why legacy worked?

i just followed the installation instructions for tf12* Install TensorFlow with pip and when i run and a simple optimizers.Adam() I get, 2023-04-09 18:02:22.272960: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:530] Can’t find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice. Also when i look in the directory there is no nvvc file downloaded by nvidia.

I’ve tried looking though “\wsl.localhost\Ubuntu\home\steve\anaconda3.23.03tf12\lib\python3.10\site-packages\keras\optimizers\adam.py” for a solution but its a bit above my paygrade.

I have also noticed that for windows native, tf.keras.optimizers.experimental.Adam(), the same error occurs InternalError: Graph execution error: … Node: ‘StatefulPartitionedCall_2’
libdevice not found at ./libdevice.10.bc
[[{{node StatefulPartitionedCall_2}}]] [Op:__inference_train_function_739].

But at least i can find “C:\Users\sjc52\anaconda3.2022.10\pkgs\cuda-nvcc-11.7.99-0”

Thank you for your response! If you do figure out why please share!