Eigen bfloat16 GEMM in TF

I am interested in trying to get Eigen bfloat16 GEMM to work in TF - similar to float32/64. For most architectures there are issues with the accumulators in bfloat16 GEMM. These cause rounding and over/underflow problems. But for PowerPC (Altivec/VSX), I have specialized code in Eigen to accumulate in float32 - plus for Power10 it uses hardware accelerated instructions (MMA) for increased performance. Unfortunately when I try to run the unit tests, it never calls bfloat16 GEMM in Eigen. Could someone help me with how I can tie in this new GEMM (bfloat16) into TF for PowerPC only?

Hi @Chip_Kerchner, At present TensorFlow binaries use AVX instructions. As you are using MMA instruction it is a feature request type of issue. Please post a feature request type of issues in GitHub. Thank You.

The code shouldn’t be specific to PowerPC (or MMA) - though it will only be included for this platform in Tensorflow.

I thought someone might know how the matrix multiply routines for Eigen are tied into Tensorflow - whether it is a single place or multiple. With that knowledge, I can make the changes myself (hopefully).