Tensorflow Lite model with select ops in Android studio app (kotlin)

Hi:) I am trying to load a tflite model in my android app, but I get these two error messages:
Skjermbilde 2024-03-20 kl. 14.19.23

This is how I converted the model:

tf_model = tf.saved_model.load('Models/MoViNet/models/movinet_freez10_3')
converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS,
  tf.lite.OpsSet.SELECT_TF_OPS
]
tflite_model = converter.convert()
open('Models/MoViNet/lite/model.tflite', 'wb').write(tflite_model)

This is how I try to load my model when it is located in assets folder:

    try {
        val tfliteModel = FileUtil.loadMappedFile(context, "modelCopy.tflite")
        val tflite = Interpreter(tfliteModel)
    } catch (e: IOException) {
        Log.e("tfliteSupport", "Error reading model", e)
    }

I have also tried this where the model is in a ml package (got the same error):

import com.example.slr.ml.Model
val model = Model.newInstance(context)

And I have these dependencies in my gradle file:

implementation("org.tensorflow:tensorflow-lite:0.0.0-nightly-SNAPSHOT")
// This dependency adds the necessary TF op support.
implementation("org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly-SNAPSHOT")
implementation("org.tensorflow:tensorflow-lite-support:0.0.0-nightly-SNAPSHOT")
implementation("org.tensorflow:tensorflow-lite-metadata:0.0.0-nightly-SNAPSHOT")

Do any of you have an idea how to include the AvgPool3D operation or another way to fix this issue? (it is listed in supported core ops here)

Hello, I have also been struggling with MoViNet for quite a while now and haven’t gotten as far as you could. Is there anyway I could contact you for a few questions on how you trained the model and converted it to tflite?

Hi @I_M

Can you check with netron app to see the inputs and outputs of your .tflite model?
Can you paste here the result?

Regards

The model is very long so I took a screenshot of the input and output description

This is the whole model

So, I guess from the netron result I see that your .tflite file is generated that way so your input is just 1 with dimensions 1,1,1,1,3 . I think this is wrong don’t you? Probably there is an issue during the conversion. You have to check again the inputs/outputs of the Movinet model and check that the same are after conversion.
You can do the same with netron and the Movinet saved model (use the .zip or the .pb file)

I agree that the input dimensions look wronf, it is supposed to be [batch_size, frames, resolution, resolution, 3] not just 1.

I tried to look at the saved model using the .pb file (did not work with zip), but the .pb file looked very different than the tflite model and did not really make sense to me.
This is a scrrenshot of the far left and it looks quite similar all the way to the end.

Also this is how I saved the model, is it something wrong with this perhaps?

input_shape = (batch_size, frames, resolution, resolution, 3)
input_shape_concrete = [1 if s is None else s for s in input_shape]
print(input_shape_concrete)
model.build(input_shape_concrete)
 _ = model(tf.ones(input_shape_concrete))
 tf.saved_model.save(model, f'Models/MoViNet/models/{name}')

I see!

That leaves us with no other option than for you to create a colab notebook and share it here. Somewhere there is an error in the process. If it is not under NDA copy paste a link of the notebook with the code to take a look.

Regards

Okay, here is the colab.
Let me know if you also need to see the code for processing the input videos.
Thanks so much for trying to help me!!

Since I cannot run and debug because the colab does not contain links for the model, the only suggestion I can make right now is to test the model with an input before converting.
I think the error is somewhere here:

input_shape = (batch_size, frames, resolution, resolution, 3)
input_shape_concrete = [1 if s is None else s for s in input_shape]
print(input_shape_concrete)
model.build(input_shape_concrete)

_ = model(tf.ones(input_shape_concrete))
tf.saved_model.save(model, f'Models/MoViNet/models/{name}')

So before saving it do an inference with an image to see if the result is OK. Then adjust the code and save it before you convert it.

Ping me when you have updates.

So I did what you said and the inferenc where the same before and after saving/load the saved model. I therefore googled around some more and finally managed to get the right input shape for my tflite model by using this code for saving and converting:

input_shape = [1, frames, resolution, resolution, 3]
export_saved_model.export_saved_model(model=model, input_shape=input_shape, export_path=f'Models/MoViNet/models/{name}')


model = tf.saved_model.load(f"Models/MoViNet/models/{name}") 
concrete_func = model.signatures[ tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY] 
concrete_func.inputs[0].set_shape([1, frames, resolution, resolution, 3])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func]) 
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS,
  tf.lite.OpsSet.SELECT_TF_OPS
]
tflite_model = converter.convert()

It currently looks like this in the start:

However I am now back to struggling with android studio. Do you have any tips for how to run my model there, more specifically how I can get the right input format? I can’t seem to find code examples where the input is videos.

So far this is what I have, I convert the video to a list of frames that are Bitmaps. Also can’t decide if I should use the Interpreter API or the other method that I have started with.

fun predict(context: Context, mmr: MediaMetadataRetriever): Pair<List<Float>, Long> {

    val frames = videoFrames(mmr)
    
    // Option 1: One way to use the model, but unsure how to create the input
    val model = MovinetA1Base04052024091340.newInstance(context)
    val input = TensorBuffer.createFixedSize(intArrayOf(1,4), DataType.FLOAT32) // not sure about this

    // Option 2: interpreter
    try {
        val tfliteModel = FileUtil.loadMappedFile(context, "movinet_a1_base_04042024_170320Copy.tflite")
        val tflite = Interpreter(tfliteModel)
    } catch (e: IOException) {
        Log.e("tfliteSupport", "Error reading model", e)
    }
    
    // Unrelated to the options
    val startTime = SystemClock.elapsedRealtime()
    
    val output = emptyList<Float>()

    val inferenceTime = SystemClock.elapsedRealtime()-startTime
    return Pair(output, inferenceTime)
}


private fun videoFrames(mmr: MediaMetadataRetriever): List<Bitmap> {
    val frames = mutableListOf<Bitmap>()
    val fps = 2
    var durationMs = 0.0

    val duration = mmr.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION)
    if (duration !=null) {
        durationMs = duration.toDouble()
    }
    val durationInSec = ceil(durationMs/1000).toInt()
    Log.i("Duration", durationInSec.toString())

    for (i in 0 until fps*durationInSec){
        val timeUs = (i*durationMs/(fps*durationInSec)).toInt()
        val bitmap = mmr.getFrameAtTime(timeUs.toLong())
        if (bitmap!=null) {
            val resized = Bitmap.createScaledBitmap(bitmap, 172,172,false)
            frames.add(resized)
            Log.i("Bitmap", "Registered bitmap frame at $timeUs")
        } else{
            Log.i("No bitmap", "Found no bitmap at frame: $timeUs")
        }

    }
    return frames
}

I would go with option #2 and use Interpreter. Check this file where is a method to convert a bitmap to ByteBuffer.
With the ByteBuffer you can feed the Interpreter. Inside the above file there is the BitmapToByteBuffer method which is a standard method to create a [1,width, height, 3] bitmap to bytebuffer. Pay attention to the buffer that uses * 4 which is for floats.

You have to adjust though to include the [frames] parameter your .tflite expects.

Come back if you have more questions.

Thank you, I used the file for converting the bitmaps to bytebuffer, and then later created a tensorbuffer as the final input. However I got a new error when trying to run the model and I find it very hard to understand the errors, do you perhaps have some recommendation for debugging this type of code when I don’t have that much experience in Android Studio?

Here is the error:

And this is the current code:

const val RESOLUTION = 172
const val BATCH_SIZE = 1
const val CHANNELS = 3
const val NUM_FRAMES = 20
fun predict(context: Context, mmr: MediaMetadataRetriever): Pair<FloatArray, Long> {
    val frames = videoFrames(mmr)
    // Define the shape and data type of the TensorBuffer
    val shape = intArrayOf(BATCH_SIZE, frames.size, RESOLUTION, RESOLUTION, CHANNELS)
    // Create an empty TensorBuffer with the desired shape and data type
    val tensorBuffer = TensorBuffer.createFixedSize(shape, DataType.FLOAT32)
    // Calculate the size of a single slice based on the shape of the TensorBuffer
    val sliceSize = tensorBuffer.buffer.limit() / tensorBuffer.shape[1]
    // Iterate over each ByteBuffer in the list and load it into the appropriate slice of the TensorBuffer
    for (i in frames.indices) {
        val byteBuffer = frames[i]
        // Calculate the offset for the current slice
        val offset = i * sliceSize
        // Copy the contents of the ByteBuffer to the appropriate slice of the TensorBuffer
        byteBuffer.position(0)
        tensorBuffer.buffer.position(offset)
        tensorBuffer.buffer.put(byteBuffer)
    }

    val output = TensorBuffer.createFixedSize(intArrayOf(1, 100), DataType.FLOAT32)
    var inferenceTime = 0.toLong()

    try {
        val tfliteModel = FileUtil.loadMappedFile(context, "movinet_a1_base_04052024_091340.tflite")
        val tflite = Interpreter(tfliteModel)
        Log.i("signature keys",tflite.signatureKeys.toString())
        // Inference
        val startTime = SystemClock.elapsedRealtime()
        tflite.run(tensorBuffer.buffer, output.buffer); // This is the line causing the error
        inferenceTime = SystemClock.elapsedRealtime()-startTime
    } catch (e: IOException) {
        Log.e("tfliteSupport", "Error reading model", e)
    }

    return Pair(output.floatArray, inferenceTime)
}

private fun videoFrames(mmr: MediaMetadataRetriever): List<ByteBuffer> {
    val frames = mutableListOf<ByteBuffer>()
    var durationMs = 0.0

    val duration = mmr.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION)
    if (duration !=null) {
        durationMs = duration.toDouble()
    }

    val frameStep = (durationMs/ NUM_FRAMES).toBigDecimal().setScale(2, RoundingMode.DOWN).toDouble()
    for (i in 0 until NUM_FRAMES){
        val timeUs = (i*frameStep)
        var bitmap = mmr.getFrameAtTime(timeUs.toLong())
        if (bitmap!=null) {
            bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true)
            val resized = Bitmap.createScaledBitmap(bitmap, RESOLUTION, RESOLUTION,false)
            val inputImage = bitmapToByteBuffer(resized, RESOLUTION, RESOLUTION)
            frames.add(inputImage)
            Log.i("Bitmap", "Registered bitmap frame at $timeUs")
        } else{
            Log.i("No bitmap", "Found no bitmap at frame: $timeUs")
        }
    }
    return frames
}

I guess the ByteBuffer you are creating is wrong. You have to check again that.
If this is of extremely difficulty you can feed the interpreter directly with a FloatArray that in your case will be of size [1,20,172,172,3]. That will be kinda slower than feeding with a ByteBuffer but it will give you a head start and an alternative before you see again the ByteBuffer creation.

I created an updated version of the above code snippet:

fun bitmapArrayToByteBuffer(
    bitmaps: Array<Bitmap>,
    width: Int,
    height: Int,
    mean: Float = 0.0f,
    std: Float = 255.0f
): ByteBuffer {
    val totalBytes = bitmaps.size * width * height * 3 * 4 // Check your case for 20 Bitmaps
    val inputImage = ByteBuffer.allocateDirect(totalBytes)
    inputImage.order(ByteOrder.nativeOrder())

    for (bitmap in bitmaps) {
        val scaledBitmap = scaleBitmapAndKeepRatio(bitmap, width, height)
        val intValues = IntArray(width * height)
        scaledBitmap.getPixels(intValues, 0, width, 0, 0, width, height)

        // Normalize and add pixels for each Bitmap
        for (y in 0 until height) {
            for (x in 0 until width) {
                val value = intValues[y * width + x]
                inputImage.putFloat(((value shr 16 and 0xFF) - mean) / std)
                inputImage.putFloat(((value shr 8 and 0xFF) - mean) / std)
                inputImage.putFloat(((value and 0xFF) - mean) / std)
            }
        }

        scaledBitmap.recycle()  // Free memory after processing
    }

    inputImage.rewind()
    return inputImage
}

Check if this fixes your error and you can see if the result is OK.

I tried your code and also a couple of other things, but still got the same error.
So I suspect there is something wrong further down in the tflite model:/

If you suspect something is wrong with your tflite file then you can perform inference first with the TensorFlow Lite Interpreter API in Python. With that you can verify that your model is converted OK.
Then you can jump again inside android.

I figured I should post an update in case anyone encounters similar problems.

In the end, I learned that it was not possible to use the base MoViNet mode with .tflitel, so I ended up only using the stream version of MoViNet, but modified to use a whole video as input. This ended up working in the Android app!

If anyone wants additional details, just message me here on TensorFlow or reply to this post. I will also make the GitHub repository public when the project is completely finished and write another reply with the link:)

2 Likes

Hi, I_M

I have a situation similar to yours. Recently, I switched to using movinet stream and used video as input same your :slight_smile:

Could you share the code you used to process video input on Android?

Thanks in advance.

1 Like

Hi!,

Of course, this code includes both the processing of videos and using the model. Btw, I have not written all the code myself. I have combined code from tutorials I found and also help I got previously in this post:)
I will also make the whole project public in 1-2 weeks.
If you want to see it before, you could send me a private message here, and I can share the GitHub repo with you.

import android.content.Context
import android.graphics.Bitmap
import android.graphics.Matrix
import android.graphics.RectF
import android.media.MediaMetadataRetriever
import android.os.SystemClock
import android.util.Log
import org.tensorflow.lite.Interpreter
import org.tensorflow.lite.support.common.FileUtil
import org.tensorflow.lite.support.label.Category
import java.nio.ByteBuffer
import java.nio.ByteOrder
import kotlin.math.exp
import kotlin.math.max
import kotlin.math.roundToInt

class StreamVideoClassifier private constructor(
     private val interpreter: Interpreter,
     private val labels: List<String>,
     private val maxResults: Int?
) {
    private val outputCategoryCount = interpreter
        .getOutputTensorFromSignature(LOGITS_OUTPUT_NAME, SIGNATURE_KEY)
        .shape()[1]
    private var inputState = HashMap<String, Any>()

    companion object {
        private const val IMAGE_INPUT_NAME = "image"
        private const val LOGITS_OUTPUT_NAME = "logits"
        private const val SIGNATURE_KEY = "serving_default"

        fun createFromFileAndLabelsAndOptions(
            context: Context,
            modelFile: String,
            labelFile: String,
            options: StreamVideoClassifierOptions
        ): StreamVideoClassifier {
            // Create a TFLite interpreter from the TFLite model file.
            val interpreter = Interpreter(FileUtil.loadMappedFile(context, modelFile))

            // Load the label file.
            val labels = FileUtil.loadLabels(context, labelFile)

            // Save the max result option.
            val maxResults = if (options.maxResults > 0 && options.maxResults <= labels.size)
                options.maxResults else null

            return StreamVideoClassifier(interpreter, labels, maxResults)
        }
    }

    init {
        if (outputCategoryCount != labels.size)
            throw java.lang.IllegalArgumentException(
                "Label list size doesn't match with model output shape " +
                        "(${labels.size} != $outputCategoryCount"
            )
        inputState = initializeInput()
    }

    /**
     * Initialize the input objects and fill them with zeros.
     */
    private fun initializeInput(): HashMap<String, Any> {
        val inputs = HashMap<String, Any>()
        for (inputName in interpreter.getSignatureInputs(SIGNATURE_KEY)) {
            // Skip the input image tensor as it'll be fed in later.
            if (inputName.equals(IMAGE_INPUT_NAME))
                continue

            // Initialize a ByteBuffer filled with zeros as an initial input of the TFLite model.
            val tensor = interpreter.getInputTensorFromSignature(inputName, SIGNATURE_KEY)
            val byteBuffer = ByteBuffer.allocateDirect(tensor.numBytes())
            byteBuffer.order(ByteOrder.nativeOrder())
            inputs[inputName] = byteBuffer
        }

        return inputs
    }

    /**
     * Initialize the output objects to store the TFLite model outputs.
     */
    private fun initializeOutput(): HashMap<String, Any> {
        val outputs = HashMap<String, Any>()
        for (outputName in interpreter.getSignatureOutputs(SIGNATURE_KEY)) {
            // Initialize a ByteBuffer to store the output of the TFLite model.
            val tensor = interpreter.getOutputTensorFromSignature(outputName, SIGNATURE_KEY)
            val byteBuffer = ByteBuffer.allocateDirect(tensor.numBytes())
            byteBuffer.order(ByteOrder.nativeOrder())
            outputs[outputName] = byteBuffer
        }

        return outputs
    }

    /**
     * Run classify on a video and return a list include action and score.
     */
    fun classifyVideo(mmr: MediaMetadataRetriever): Pair<List<Category>, Long>{
        Log.d(TAG, "Starting classification")
        val frames = videoFrames(mmr)
        val tensorvideo = bitmapArrayToByteBuffer(frames, RESOLUTION, RESOLUTION)
        inputState[IMAGE_INPUT_NAME] = tensorvideo

        // Initialize a placeholder to store the output objects.
        val outputs = initializeOutput()

        // Run inference using the TFLite model.
        val startTime = SystemClock.elapsedRealtime()
        interpreter.runSignature(inputState, outputs)
        val inferenceTime = SystemClock.elapsedRealtime()-startTime
        // Post-process the outputs.
        var categories = postprocessOutputLogits(outputs[LOGITS_OUTPUT_NAME] as ByteBuffer)

        // Store the output states to feed as input for the next frame.
        outputs.remove(LOGITS_OUTPUT_NAME)
        inputState = outputs

        // Sort the output and return only the top K results.
        categories.sortByDescending { it.score }

        // Take only maxResults number of result.
        maxResults?.let {
            categories = categories.subList(0, max(maxResults, categories.size))
        }
        return Pair(categories, inferenceTime)
    }

    /**
     * Convert output logits of the model to a list of Category objects.
     */
    private fun postprocessOutputLogits(logitsByteBuffer: ByteBuffer): MutableList<Category> {
        // Convert ByteBuffer to FloatArray.
        val logits = FloatArray(outputCategoryCount)
        logitsByteBuffer.rewind()
        logitsByteBuffer.asFloatBuffer().get(logits)

        // Convert logits into probability list.
        val probabilities = softmax(logits)

        // Append label name to form a list of Category objects.
        val categories = mutableListOf<Category>()
        probabilities.forEachIndexed { index, probability ->
            categories.add(Category(labels[index], probability))
        }
        return categories
    }

    /**
     * Close the interpreter when it's no longer needed.
     */
    fun close() {
        interpreter.close()
    }

    class StreamVideoClassifierOptions private constructor(
        val maxResults: Int
    ) {
        companion object {
            fun builder() = Builder()
        }

        class Builder {
            private var numThreads: Int = -1
            private var maxResult: Int = -1

            fun setNumThreads(numThreads: Int): Builder {
                this.numThreads = numThreads
                return this
            }

            fun setMaxResult(maxResults: Int): Builder {
                if ((maxResults <= 0) && (maxResults != -1)) {
                    throw IllegalArgumentException("maxResults must be positive or -1.")
                }
                this.maxResult = maxResults
                return this
            }

            fun build(): StreamVideoClassifierOptions {
                return StreamVideoClassifierOptions(this.maxResult)
            }
        }
    }
}

/**
 *
 */
fun softmax(floatArray: FloatArray): FloatArray {
    var total = 0f
    val result = FloatArray(floatArray.size)
    for (i in floatArray.indices) {
        result[i] = exp(floatArray[i])
        total += result[i]
    }

    for (i in result.indices) {
        result[i] /= total
    }
    return result
}


/**
 * Getting 20 evenly spread frames from a video and return them as an array of Bitmaps
 */
fun videoFrames(mmr: MediaMetadataRetriever): Array<Bitmap> {
    var frames = emptyArray<Bitmap>()

    val duration = mmr.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION)
    val numberFrames = mmr.extractMetadata(MediaMetadataRetriever.METADATA_KEY_VIDEO_FRAME_COUNT)
    Log.d(TAG, "Video length: $duration ms")

    val frameStep: Int
    if (numberFrames!=null) {
        frameStep = (numberFrames.toDouble()/ NUM_FRAMES).roundToInt()
        for (i in 0 until NUM_FRAMES){
            var frame = i*frameStep
            if(frame >= numberFrames.toInt()){
                frame = numberFrames.toInt()-1
            }
            val bitmap = mmr.getFrameAtIndex(frame)
            if (bitmap!=null) {
                frames += bitmap
            } else{
                Log.d("No bitmap", "Found no bitmap at frame: $frame")
            }

        }
    }
    return frames
}

/**
 * Convert array of Bitmaps to a ByteBuffer
 * https://github.com/farmaker47/Segmentation_and_Style_Transfer/blob/master/app/src/main/java/com/soloupis/sample/ocr_keras/utils/ImageUtils.kt
 */
fun bitmapArrayToByteBuffer(
    bitmaps: Array<Bitmap>,
    width: Int,
    height: Int,
    mean: Float = 0.0f,
    std: Float = 255.0f
): ByteBuffer {
    val totalBytes = bitmaps.size * width * height * 3 * 4 // Check your case for 20 Bitmaps
    val inputImage = ByteBuffer.allocateDirect(totalBytes)
    inputImage.order(ByteOrder.nativeOrder())

    for (bitmap in bitmaps) {
        // val scaledBitmap = scaleBitmapAndKeepRatio(bitmap, width, height)
        val centerBitmap = if (bitmap.width >= bitmap.height){
            Bitmap.createBitmap(bitmap, bitmap.width/2 - bitmap.height/2, 0, bitmap.height, bitmap.height)
        }else{
            Bitmap.createBitmap(bitmap, 0, bitmap.height/2 - bitmap.width/2, bitmap.width, bitmap.width)
        }
        val intValues = IntArray(width * height)
        val scaledBitmap = scaleBitmapAndKeepRatio(centerBitmap, width, height)
        scaledBitmap.getPixels(intValues, 0, width, 0, 0, width, height)

        // Normalize and add pixels for each Bitmap
        for (y in 0 until height) {
            for (x in 0 until width) {
                val value = intValues[y * width + x]
                inputImage.putFloat(((value shr 16 and 0xFF) - mean) / std)
                inputImage.putFloat(((value shr 8 and 0xFF) - mean) / std)
                inputImage.putFloat(((value and 0xFF) - mean) / std)
            }
        }
        scaledBitmap.recycle()  // Free memory after processing
        bitmap.recycle()
    }

    inputImage.rewind()
    return inputImage
}

/**
 * Scale Bitmap to given ratio while keeping ratio of original Bitmap
 * https://github.com/farmaker47/Segmentation_and_Style_Transfer/blob/master/app/src/main/java/com/soloupis/sample/ocr_keras/utils/ImageUtils.kt
 */
fun scaleBitmapAndKeepRatio(
    targetBmp: Bitmap,
    reqHeightInPixels: Int,
    reqWidthInPixels: Int
): Bitmap {
    if (targetBmp.height == reqHeightInPixels && targetBmp.width == reqWidthInPixels) {
        return targetBmp
    }
    val matrix = Matrix()
    matrix.setRectToRect(
        RectF(
            0f, 0f,
            targetBmp.width.toFloat(),
            targetBmp.width.toFloat()
        ),
        RectF(
            0f, 0f,
            reqWidthInPixels.toFloat(),
            reqHeightInPixels.toFloat()
        ),
        Matrix.ScaleToFit.FILL
    )
    return Bitmap.createBitmap(
        targetBmp, 0, 0,
        targetBmp.width,
        targetBmp.width, matrix, true
    )
}
1 Like

The project is finished and the github repository is public:)

1 Like