A couple months back, we introduced full-blown XLA support to the TensorFlow text generation models in Transformers. Through these changes to incorporate XLA compatibility, we were able to significantly improve the speed of the text generation models ~ 100x faster than before.
We recently did a guest blog with the TensorFlow team to discuss the technical aspects that went into consideration for delivering this user experience.
Read all about this here: