Getting 0 nodes delegated while using TFLite C Library with TFLite Flex Delegate

I am using the TFLite C library to generate model inference C code for an edge device.
In my use case, I require certain operations, such as tf.unpack with shape (0, 2), which are supported by TensorFlow but not currently available in TFLite.

How I built the TFLite C Library

I found a solution article on the Google AI Edge website (here), which provides instructions on building the TFLite C library using CMake. I followed these steps and successfully built the library without any errors.

As a result, I obtained libtensorflowlite_c.so and placed it in /usr/local/lib, which is the system library directory on Ubuntu.

How I built the Tensorflow Flex Delegate library

Similarly, there is an article here that explains how to build libtensorflowlite_flex.so. I followed the instructions provided and successfully built the library without encountering any issues.

To verify the build, I ran the command:

strings /usr/local/lib/libtensorflowlite_flex.so | grep Unpack

This command displayed several symbols containing Unpack, which indicates that the build was successful, I think.

Click to show output
UnpackOp<CPUDevice, ::tensorflow::uint64>
UnpackOp<CPUDevice, ::int64_t>
UnpackOp<CPUDevice, ::tensorflow::uint32>
UnpackOp<CPUDevice, ::tensorflow::uint16>
UnpackOp<CPUDevice, ::tensorflow::int16>
UnpackOp<CPUDevice, ::tensorflow::uint8>
UnpackOp<CPUDevice, ::tensorflow::int8>
UnpackOp<CPUDevice, ::tensorflow::int32>
UnpackOp<CPUDevice, Eigen::half>
UnpackOp<CPUDevice, ::tensorflow::bfloat16>
UnpackOp<CPUDevice, ::tensorflow::complex64>
UnpackOp<CPUDevice, ::tensorflow::complex128>
UnpackOp<CPUDevice, ::tensorflow::tstring>
UnpackOp<CPUDevice, ::tensorflow::ResourceHandle>
UnpackOp<CPUDevice, ::tensorflow::Variant>
UnpackOp<CPUDevice, ::tensorflow::float8_e5m2>
UnpackOp<CPUDevice, ::tensorflow::float8_e4m3fn>
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint64, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint64, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::int64_t, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::int64_t, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint32, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint32, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint16, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint16, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::int16, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::int16, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint8, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::uint8, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::int8, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::int8, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::int32, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::int32, false >
TensorArrayUnpackOrScatterOp<CPUDevice, Eigen::half, true >
TensorArrayUnpackOrScatterOp<CPUDevice, Eigen::half, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::bfloat16, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::bfloat16, false >
TensorArrayUnpackOrScatterOp<CPUDevice, float, true >
TensorArrayUnpackOrScatterOp<CPUDevice, float, false >
TensorArrayUnpackOrScatterOp<CPUDevice, double, true >
TensorArrayUnpackOrScatterOp<CPUDevice, double, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::complex64, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::complex64, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::complex128, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::complex128, false >
TensorArrayUnpackOrScatterOp<CPUDevice, bool, true >
TensorArrayUnpackOrScatterOp<CPUDevice, bool, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::tstring, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::tstring, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::ResourceHandle, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::ResourceHandle, false >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::Variant, true >
TensorArrayUnpackOrScatterOp<CPUDevice, ::tensorflow::Variant, false >
'(new gtl::FlatSet<string>{"ArgMax", "ArgMin", "AudioSpectrogram", "AvgPool", "BatchMatMul", "BatchMatMulV2", "BatchNormWithGlobalNormalization", "BatchToSpace", "BatchToSpaceND", "Bincount", "BroadcastArgs", "BroadcastGradientArgs", "Bucketize", "CTCBeamSearchDecoder", "CTCGreedyDecoder", "CTCLoss", "CompareAndBitpack", "ComplexAbs", "Concat", "ConcatOffset", "ConcatV2", "Conv2D", "Copy", "CopyHost", "Cross", "CudnnRNN", "CudnnRNNBackprop", "CudnnRNNBackpropV2", "CudnnRNNBackpropV3", "CudnnRNNCanonicalToParams", "CudnnRNNCanonicalToParamsV2", "CudnnRNNParamsSize", "CudnnRNNParamsToCanonical", "CudnnRNNParamsToCanonicalV2", "CudnnRNNV2", "CudnnRNNV3", "CumProd", "CumSum", "DebugNanCount", "DebugNumericSummary", "DecodeProtoV2", "DecodeWav", "DeepCopy", "DepthToSpace", "Dequantize", "Diag", "DiagPart", "EditDistance", "Empty", "EncodeProtoV2", "EncodeWav", "ExtractImagePatches", "ExtractVolumePatches", "Fill", "Gather", "GatherNd", "GatherV2", "HistogramFixedWidth", "InvertPermutation", "IsInf", "IsNan", "Isfinite", "LinSpace", "LowerBound", "MatMul", "MatrixDiag", "MatrixDiagPart", "MatrixDiagPartV2", "MatrixDiagV2", "Mfcc", "Multinomial", "OneHot", "Pack", "ParameterizedTruncatedNormal", "PopulationCount", "RandomGamma", "RandomPoisson", "RandomPoissonV2", "RandomStandardNormal", "RandomUniform", "RandomUniformInt", "Range", "Rank", "RequantizationRange", "Requantize", "ReverseSequence", "Shape", "ShapeN", "Size", "SpaceToBatch", "SpaceToBatchND", "SpaceToDepth", "SparseMatMul", "Split", "SplitV", "TruncatedNormal", "Unique", "UniqueV2", "UniqueWithCounts", "UniqueWithCountsV2", "Unpack", "UnravelIndex", "UpperBound", "Where"})' Must be non NULL
TensorArrayUnpack
Unpack
UnpackOp<CPUDevice, float>
UnpackOp<CPUDevice, double>
UnpackOp<CPUDevice, bool>
UnpackOp<CPUDevice, int32>
UnpackOp<CPUDevice, int64>
UnpackGrad
tfg.Unpack
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEmEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceElEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEjEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEtEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEsEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEhEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEaEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEiEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceENS1_4halfEEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceENS1_8bfloat16EEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEfEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEdEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceESt7complexIfEEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceESt7complexIdEEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEbEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEN3tsl7tstringEEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceENS_14ResourceHandleEEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceENS_7VariantEEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEN9ml_dtypes15float8_internal11float8_e5m2EEE
N10tensorflow8UnpackOpIN5Eigen16ThreadPoolDeviceEN9ml_dtypes15float8_internal13float8_e4m3fnEEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEmLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEmLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceElLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceElLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEjLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEjLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEtLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEtLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEsLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEsLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEhLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEhLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEaLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEaLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEiLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEiLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS1_4halfELb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS1_4halfELb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS1_8bfloat16ELb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS1_8bfloat16ELb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEfLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEfLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEdLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEdLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceESt7complexIfELb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceESt7complexIfELb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceESt7complexIdELb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceESt7complexIdELb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEbLb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEbLb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEN3tsl7tstringELb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceEN3tsl7tstringELb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS_14ResourceHandleELb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS_14ResourceHandleELb0EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS_7VariantELb1EEE
N10tensorflow28TensorArrayUnpackOrScatterOpIN5Eigen16ThreadPoolDeviceENS_7VariantELb0EEE
Unpack

How I use C Library Function to load model and delegate library

I have check the model that some ops do have Flex prefix in the op names.

It was converted using the following settings:

converter = tf.lite.TFLiteConverter.from_saved_model(model_folder)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.SELECT_TF_OPS,  tf.lite.OpsSet.TFLITE_BUILTINS, 
]
converter.allow_custom_ops = True
converter.legalize_custom_tensor_list_ops = True
converter._experimental_lower_tensor_list_ops = False 
converter.experimental_enable_resource_variables = True
tflite_model = converter.convert()

Then, I use the code below to load the model and delegate operations that are not supported by TFLite to the Flex library.

void tflite_load(const string model_path, map<string, TfLiteSignatureRunner*>& sig_map) {
    // Load model
    TfLiteModel* model = TfLiteModelCreateFromFile(model_path.c_str());
    if (model == nullptr) {
        cerr << "Failed to load model." << endl;
        return;
    }

    // Create Interpreter
    TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
    TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
    TfLiteInterpreterOptionsDelete(options);

    auto hdll = SharedLibrary::LoadLibrary("libtensorflowlite_flex.so");
    if (hdll == nullptr) {
        cerr << "Load failed:" << dlerror() << endl;
        return;
    }
    auto TF_AcquireFlexDelegate = reinterpret_cast<Interpreter::TfLiteDelegatePtr(*)()>(SharedLibrary::GetLibrarySymbol(hdll, "TF_AcquireFlexDelegate"));
    if (TF_AcquireFlexDelegate == NULL) {
        cerr << "TF_AcquireFlexDelegate couldn't be run: " << dlerror() << endl;
        return;
    }
     
    std::unique_ptr<TfLiteDelegate, void (*)(TfLiteDelegate*)> delegate = TF_AcquireFlexDelegate();
    auto TfLiteStatus = TfLiteInterpreterModifyGraphWithDelegate(interpreter, delegate.get());
    if(TfLiteStatus==0)
        cout << "ModifyGraphWithDelegate Ok" << endl;

    cout << "----------Load model successfully!----------" << endl;
}

However, it always displays a message indicating that no nodes have been delegated by the Flex library.

INFO: Created TensorFlow Lite delegate for select TF ops.
2025-03-14 17:43:31.412379: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 623 nodes with 0 partitions.

ModifyGraphWithDelegate Ok
----------Load model successfully!----------

How should I resolve this issue?
I would really appreciate any insights or suggestions. Thanks in advance for your help!

Hi @l1234567j222, Apologies for the delayed response.
It seems the issue has already been addressed here. Please try with the latest versions as suggested in the issue and let us know If you are still facing the issue.
Thanks!