Xla (jit_compile flag) and gpu memory usage

ramonmatas · October 20, 2021, 2:33am

We are observing unexplained out of GPU memory events when trying to train a complex large model (involving conditional execution) and enabling XLA (jit_compile=True flag for tf.function).

Unfortunately, we haven’t been able to reproduce the issue in a reduced shareable form just yet, so I am writing here mostly for feedback.

What we see:

In general GPU utilization for an XLA compiled model goes considerably down compared to non-compiled graph mode or eager execution. This is what we measure in most of our models and in all our small test-cases.
However, in some instances, large models exceed memory capacity when compiled while they can still run in eager mode for the same exact batch size.

Both behaviors seem contradictory, so we are wondering if there are some known corner cases involving XLA that may produce this and we can avoid (we really need the extra train efficiency coming out of XLA).

Thanks!

lgusm · October 21, 2021, 4:46pm

Hi Ramon,

1- I don’t know the answer, but did you watch this: https://www.youtube.com/watch?v=cPAD9vLKE0c
you might get some insight there

I’ll ping some people to get you more insights (no promises)

George_Karpenkov · October 21, 2021, 7:30pm

Hi Ramon,

Yes, in general XLA should make things faster, but sometimes there are no guarantees. Bugs where it makes things a lot slower are welcome.

George

ramonmatas · October 21, 2021, 7:46pm

Thanks Gus! The video was very informative, I had not seen this one. However it did not help explain the memory usage increase we are observing. Looking forward to hear experience from others. Thanks!

ramonmatas · October 21, 2021, 7:48pm

Hi George,

In this case performance does not seem to be the issue, but instead increased GPU memory usage. Is there a list of known problems that we may be hitting? Or maybe best practices in terms of operators? I would love to file a bug/issue on this, but as I mentioned in the original post, so far we have been unable to isolate the behavior in a shareable form.

Thanks again!
Ramon.

George_Karpenkov · October 21, 2021, 9:56pm

It’s hard to say at TF level.

You can dump the buffer assignment, or maybe visualize the HLO graph, and try to see where the memory is going, and then go from there.

gertituzi · October 23, 2021, 5:15am

Hi George,

Is there a resource (website, doc, etc) showing the steps how to visualize HLO graph ? I’m unable to find anything comprehensive except for a couple of graphviz issues posts.

Bhack · October 23, 2021, 2:33pm

There is a an usage of the tool at:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/tools/interactive_graphviz.cc#L98:L103

Topic		Replies	Views
XLA JIT Compile causing Memory Leak Keras cuda , tf213 , colab	1	711	February 14, 2024
Visualize TensorFlow graphs before and after Grappler passes? General Discussion xla , help_request	4	1278	March 14, 2023
CPU cores not fully utilized with XLA General Discussion models , xla , help_request	0	1126	August 9, 2021
How to manage gpu memory allocation properly General Discussion models , gpu , help_request	1	2125	November 3, 2021
Tensorboard profile failed to capture when xla enabled General Discussion gpu , xla , tensorboard , help_request	0	1306	July 5, 2022

Xla (jit_compile flag) and gpu memory usage

Related topics