How to convert XLA HLO debug code to executable?

Zhaozheng09 · April 7, 2022, 2:01am

Background : We use TF1.15 to training model, and I’m so sorry we can not upgrade TF to 2.8, because we add some new function for custom tensorflow.
Question: I have a big model, and I want to increase training speed by using xla, but I meet a CUDA_ERROR_ILLEGAL_ADDRESS when I use xla, so I want to know how to solve this question.
now I have some infomations:

I got error xla_cluster name.
I got all inputs of error xla_cluster.
I got all XLA HLO codes by export XLA_FLAGS="--xla_dump_to=./xla"
For eaier debug, For easier debugging, I want a method that converts the IR code into an executable.
Debugging xla is very, very difficult, please help.

other infos:
cluster_728 have 30 ops.
The problem will not be repeated when I narrow down the cluster.
The problem will not recur when I shrink the batch_size again.
debug-file about cluster_728: module_0023.tar.gz

Topic		Replies	Views
What is the recommended way to use XLA directly from C++? General Discussion xla , help_request	4	1952	February 12, 2025
Xla (jit_compile flag) and gpu memory usage General Discussion gpu , xla	7	2621	October 23, 2021
MLIR Code Generation for XLA General Discussion gpu , xla , mlir	5	2875	September 2, 2022
Debugging XLA compile General Discussion xla , help_request	1	2257	February 6, 2023
How to get LLVM IR from XLA tfcompile General Discussion models , xla , help_request	15	2828	May 24, 2022

How to convert XLA HLO debug code to executable?

Related topics