On-device training of simple autoencoder fails because of Select Ops

Hi,

I am building and pre-training a basic autoencoder in Python with TF + Keras, then converting it to LiteRT “.tflite” model in order to deploy it into an ARM 32-bit Cortex-A7.

Here is the model in question:

NUM_FEATURES = 10
INPUT_SHAPE = (NUM_FEATURES,)
OUPUT_SHAPE = INPUT_SHAPE    # Autoencoder

# Min and max number of neurons in layers
N_MAX_HIDDEN = 2 * NUM_FEATURES    # Expansion layer(s) limits
N_MIN_HIDDEN = min(3, NUM_FEATURES // 2)    # Bottleneck

class Model(tf.Module):

    def __init__(self):
        self.model = tf.keras.Sequential([
            tf.keras.layers.InputLayer(INPUT_SHAPE, name='input'),
            tf.keras.layers.Dense(N_MAX_HIDDEN, activation='selu', name='dense_1_expansion'),
            tf.keras.layers.Dense(N_MIN_HIDDEN, activation='selu', name='dense_2_bottleneck'),
            tf.keras.layers.Dense(N_MAX_HIDDEN, activation='selu', name='dense_3_expansion'),
            tf.keras.layers.Dense(INPUT_SHAPE[0], activation=None, name='dense_4_output')
        ])
        
        self.model.compile(
            optimizer=tf.keras.optimizers.Adam(),
            loss=tf.keras.losses.mse)

    # The `train` function takes a batch of input images and labels.
    @tf.function(input_signature=[
      tf.TensorSpec([None, *INPUT_SHAPE], tf.float32),
      tf.TensorSpec([None, *OUPUT_SHAPE], tf.float32),
    ])
    def train(self, x, y):
        with tf.GradientTape() as tape:
            prediction = self.model(x)
            loss = self.model.loss(y, prediction)
        gradients = tape.gradient(loss, self.model.trainable_variables)
        self.model.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        return {"loss": loss}
    
    @tf.function(input_signature=[
      tf.TensorSpec([None, *INPUT_SHAPE], tf.float32),
    ])
    def infer(self, x):
        inferences = self.model(x)
        return {"inferences": inferences,}
    
    @tf.function(input_signature=[tf.TensorSpec(shape=[], dtype=tf.string)])
    def save(self, checkpoint_path):
        tensor_names = [weight.name for weight in self.model.weights]
        tensors_to_save = [weight.read_value() for weight in self.model.weights]
        tf.raw_ops.Save(filename=checkpoint_path, tensor_names=tensor_names, data=tensors_to_save, name='save')
        return {"checkpoint_path": checkpoint_path}
    
    @tf.function(input_signature=[tf.TensorSpec(shape=[], dtype=tf.string)])
    def restore(self, checkpoint_path):
        restored_tensors = {}
        for var in self.model.weights:
            restored = tf.raw_ops.Restore(file_pattern=checkpoint_path, tensor_name=var.name, dt=var.dtype, name='restore')
            var.assign(restored)
            restored_tensors[var.name] = restored
        return restored_tensors

My goal is to perform on-device finetuning according to the docs here: On-Device Training with LiteRT. So I scrupulously followed the steps there, including the signatures and the conversion parts.

Yet I am facing conversion issues because of some Select Ops and the Flex delegate.

W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2921] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s):
Flex ops: FlexBroadcastGradientArgs, FlexRestore, FlexSave, FlexSeluGrad

It’s strange to me that a fairly simple model involves non-standard/basic Ops…

Could I get rid of those Select Ops in my model or could I avoid having to link / compile the (full) Flex delegate?
Installing the full Tensorflow PIP package on my edge device as stated here is unfortunately not an option for me… I’d loose all the benefits of the lite library…

Packages version:

  • Tensorflow: 2.15.0
  • LiteRT (ai_edge_litert): 1.4.0

Thanks in advance for your help.

1 Like

Hi, @JPat_MC I apologize for the delayed response and as far I know, The increase in binary size is due to two categories of Select TF Ops (Flex Ops). To get the minimum binary size needed for your Cortex-A7 device.

FlexSeluGrad, FlexBroadcastGradientArgsOps appear because the gradient calculation for the selu activation function is not a LiteRT built-in operator. Change your Keras layer activation from activation='selu' to a built-in option such as activation='relu' or `activation=‘relu6’. This eliminates the need for the non-native gradient kernels. Please refer this LiteRT and TensorFlow operator compatibility

FlexSave and FlexRestoreare required because the LiteRT ODT architecture uses tf.raw_ops.Save and tf.raw_ops.Restore to persist and load trained weights in checkpoint format. These ops cannot be removed if you need the on device finetuning feature.

Since the checkpointing ops are mandatory Flex Ops, you must use LiteRT’s selective compilation to build a runtime library (.so) that includes only the few required Flex Ops (Save/Restore) and nothing else.

  • For embedded Linux (non-Android), you must use the Bazel build system, as it is the only one that supports Select TF Ops.

  • Ensure your model conversion script explicitly enables tf.lite.OpsSet.SELECT_TF_OPS

  • Use the precise Bazel configuration for your ARM 32-bit target to cross-compile the minimal runtime library. Please refer this official documentation

If I’ve missed something here please let us know if you have any trouble with the cross-compilation process. Thank you for your cooperation and patience.

1 Like