Hello,
I am trying to compile libtensorflow_cc.so version 2.14.0 on Arch linux.
At the end the linking fails with errors like:
...
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo(gpu_op_cast.pic.o): in function `tensorflow::(anonymous namespace)::MlirCastGPUDT_INT64DT_INT16Op::Invoke(tensorflow::OpKernelContext*, llvm::SmallVectorImpl<tensorflow::UnrankedMemRef>&)':
gpu_op_cast.cc:(.text._ZN10tensorflow12_GLOBAL__N_129MlirCastGPUDT_INT64DT_INT16Op6InvokeEPNS_15OpKernelContextERN4llvm15SmallVectorImplINS_14UnrankedMemRefEEE+0x10): undefined reference to `_mlir_ciface_Cast_GPU_DT_INT64_DT_INT16'
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo(gpu_op_cast.pic.o): in function `tensorflow::(anonymous namespace)::MlirCastGPUDT_INT64DT_INT32Op::Invoke(tensorflow::OpKernelContext*, llvm::SmallVectorImpl<tensorflow::UnrankedMemRef>&)':
gpu_op_cast.cc:(.text._ZN10tensorflow12_GLOBAL__N_129MlirCastGPUDT_INT64DT_INT32Op6InvokeEPNS_15OpKernelContextERN4llvm15SmallVectorImplINS_14UnrankedMemRefEEE+0x10): undefined reference to `_mlir_ciface_Cast_GPU_DT_INT64_DT_INT32'
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo(gpu_op_cast.pic.o): in function `tensorflow::(anonymous namespace)::MlirCastGPUDT_INT64DT_INT64Op::Invoke(tensorflow::OpKernelContext*, llvm::SmallVectorImpl<tensorflow::UnrankedMemRef>&)':
gpu_op_cast.cc:(.text._ZN10tensorflow12_GLOBAL__N_129MlirCastGPUDT_INT64DT_INT64Op6InvokeEPNS_15OpKernelContextERN4llvm15SmallVectorImplINS_14UnrankedMemRefEEE+0x10): undefined reference to `_mlir_ciface_Cast_GPU_DT_INT64_DT_INT64'
...
a few hundred of these.
All 649 error lines containing ‘undefined reference to’ are of the following form:
^gpu_op_[a-z0-9_]*\.cc:(\.text\._Z[^+]*+0x[0-9a-f]*): undefined reference to `_mlir_ciface_[A-Za-z0-9_]*.$
showing that all undefined references come from files with a name like gpu_op_[a-z0-9_]*\.cc
.
All of which exclusively exist in build/tensorflow/tensorflow/core/kernels/mlir_generated/
.
After further investigation, it seems that the problem comes from the use of macros that use the macro MLIR_FUNCTION
defined in tensorflow/tensorflow/core/kernels/mlir_generated/base_op.h
:
#define MLIR_FUNCTION(tf_op, platform, input_type, output_type) \
_mlir_ciface_##tf_op##_##platform##_##input_type##_##output_type
and well in particular the macros
GENERATE_UNARY_KERNEL3
, GENERATE_BINARY_KERNEL3
and GENERATE_TERNARY_KERNEL3
which are more or less similar, so l lets just look at one:
#define GENERATE_UNARY_KERNEL3(tf_op, platform, input_type, output_type, casted_input_type, casted_output_type)
which produces code like (I did some formatting):
extern "C" void MLIR_FUNCTION(tf_op, platform, input_type, output_type) // <-- Undefined reference.
(UnrankedMemRef * result, OpKernelContext * ctx, UnrankedMemRef * arg);
namespace {
class MLIR_OP(tf_op, platform, casted_input_type, casted_output_type) :
public MLIROpKernel<output_type, typename EnumToDataType<output_type>::Type, casted_output_type>
{
public:
using MLIROpKernel::MLIROpKernel;
UnrankedMemRef Invoke(OpKernelContext* ctx, llvm::SmallVectorImpl<UnrankedMemRef>& args) override
{
UnrankedMemRef result;
MLIR_FUNCTION(tf_op, platform, input_type, output_type)(&result, ctx, &args[0]); // <-- Undefined reference.
return result;
}
};
} // namespace
Where should these symbols have been defined? For example, which bazel target (some .lo or .a file) should have _mlir_ciface_Cast_GPU_DT_INT64_DT_UINT8
defined (to pick a random one)?
I checked ALL object files and archives that are being linked, and in my case only the following mention _mlir_ciface
symbols:
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_nextafter_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_relu_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_softplus_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_softsign_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_constant_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cast_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cwise_unary_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_cwise_binary_op.pic.lo
bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tools/kernel_gen/libtf_framework_c_interface.pic.a
bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tools/kernel_gen/libtf_gpu_runtime_wrappers.pic.a
bazel-out/k8-opt/bin/external/llvm-project/mlir/lib_mlir_runner_utils.pic.a
bazel-out/k8-opt/bin/external/llvm-project/mlir/lib_mlir_c_runner_utils.pic.a
The undefined symbols are all UND (undefined) coming from the bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_*.pic.lo
files. The other files define some _mlir_ciface
symbols but not the ones that are missing.
Please help.
EDIT: I managed to compile and link 2.13.0 and it turns out that 2.14.0 isn’t linking with any of the bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/lib*_kernel_generator.pic.a
files. Those archives have been created, but they aren’t linked-- hence the undefined references.
I am totally new to bazel, so any help with figuring out what the problem is is appreciated.
I’d like to add the [build] tag to this post, but I can’t figure out where/how I can do that.