I have a CNN model that I have converted to tflite and want to run on an android phone GPU. I run it like this:
TfLiteGpu.isGpuDelegateAvailable(context).addOnSuccessListener { isGPUAvailable ->
try {
val options = InterpreterApi.Options()
.setRuntime(InterpreterApi.Options.TfLiteRuntime.FROM_SYSTEM_ONLY)
.addDelegateFactory(GpuDelegateFactory())
val interpreterAPI = InterpreterApi.create(modelFile, options)
interpreterAPI.run(input, output)
resultGPUText = output[0].joinToString(separator = "\n")
} catch (e: Exception) {
errorText = e.message ?: ""
Log.e("tflite", e.message, e)
}
}
It runs about 75% faster on GPU than on CPU so that is good.
However, this is the log I get:
Initialized TensorFlow Lite runtime.
Replacing 52 out of 65 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 6 partitions for the whole graph.
Created interpreter.
Created TensorFlow Lite delegate for GPU.
Replacing 29 out of 65 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 3 partitions for the whole graph.
Initialized OpenCL-based API.
Created 1 GPU delegate kernels.
Replacing 25 out of 37 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 2 partitions for the whole graph.
Created interpreter.
I don’t fully understand what it means but I’m interpreting it as that after some optimization the model is split in 2 and 25 out of 37 operations are still run on CPU not GPU.
I did use this function when converting the model to tflite:
tf.lite.experimental.Analyzer.analyze(model_content=tflite_model, gpu_compatibility=True)
which said that everything except the first operation which is a cast from uin8 to float32 was GPU compatible.
Why is so much run on CPU? How can I see what operations are run where?
Thanks!