At SIG JVM, we have just decided to stop supporting and building native TensorFlow MKL-enabled artifacts for the following reasons:
At pretty much each new release of TensorFlow, the MKL build is broken on various platforms and it requires some gymnastics on our side to get it work again (if we are even able to).
We did not investigated much on the reasons why but performances with MKL were many times worst than without it.
So that being said, if anyone here has some insights to share about the actual status of MKL in TensorFlow and/or any ideas on how we can continue to support it without trouble, that would be greatly appreciated.
In my benchmark (training NER) the Intel Cascade Lake w/MKL was close and sometimes better than GPU (using the system’s memory it could have a larger batch size).
That’s being said, I’ve never tried testing the inference. But the training was much faster than a native CPU build on newer CPU architectures.