(Reposted from stackoverflow due to non response there; similar unresolved issues from other users are also referenced)
Win 10 64-bit 21H1; TF2.5, CUDA 11 installed in environment (Python 3.9.5 Xeus)
I am not the only one seeing this error; see also (unanswered) here and here.
The issue is obscure and the proposed resolutions are unclear/don’t seem to work (see e.g. here)
Issue Using the TF Linear_Mixed_Effects_Models.ipynb example (download from TensorFlow github here) execution reaches the point of performing the “warm up stage” then throws the error:
InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]
The console contains this output showing that it finds the GPU but XLA initialisation fails to find the - existing! - libdevice in the specified paths
2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177] StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
Now the interesting thing is that the paths searched includes “C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin”
the content of that folder includes all the (successfully loaded at TF startup) DLLs, including cudart64_110.dll, dudnn64_8.dll… and of course libdevice.10.bc
Question Since TF says it is searching this location for this file and the file exists there, what is wrong and how do I fix it?
(NB C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
does not exist… CUDA is intalled in the environment; this path must be a best guess for an OS installation)
Info: I am setting the path by
aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath
but I have also set an OS environment variable XLA_FLAGS to the same string value… I don’t know which one is actually working yet, but the fact that the console output says it searched the intended path is good enough for now