Neural network fallback to CPU using NNAPI on Android

Hello!
We are currently trying launch neural network with NNAPI on android, however it always falls back on CPU. If we are launching it with hardcoded GPU it work fine.
Have someone experienced this issue? Any recommendations?
This is our device list:

Pixel 6
Oppo Reno 5 Pro
Samsung Galaxy s10e
Xiaomi Redmi Note 10 5G
Microsoft Surface Duo
Samsung Galaxy Note 10+
Google Pixel 3XL
Samsung galaxy S21
Samsung galaxy S21+
Xiaomi MI 8
Google Pixel 4XL

1 Like

Have you checked if your model has supported NNAPI ops:

1 Like

// ***********************
// support for NNAPI operations improved significantly for Android API Level 28 (Android Pie) onwards,
// and it is recommended that developers use the NNAPI delegate for Android Pie,
// or above for most scenarios
// ***********************

import org.tensorflow.lite.Interpreter;
import org.tensorflow.lite.nnapi.NnApiDelegate;

Interpreter.Options options = (new Interpreter.Options());
NnApiDelegate nnApiDelegate = null;
// Initialize interpreter with NNAPI delegate for Android Pie or above
if(Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) {
nnApiDelegate = new NnApiDelegate();
options.addDelegate(nnApiDelegate);
}

// Initialize TFLite interpreter
try {
tfLite = new Interpreter(loadModelFile(assetManager, modelFilename), options);
} catch (Exception e) {
throw new RuntimeException(e);
}

// Run inference
// …

// Unload delegate
tfLite.close();
if(null != nnApiDelegate) {
nnApiDelegate.close();
}

Thank you, i’ve confirmed it with our data science team, we are using supported NNAPI ops. We have also tried Resnet with same result. We also created test app for testing purpose.
Maybe there are some other ideas?

We also have opportunity to change delegate (CPU, GPU, NNAPI) on the flight in test app, and table of benchmark.
Android NN perfomance for tensorflow forum - Google Sheets spreadsheet with metrics, i’ve added pic with test app to second page, we are running NN 100 times and counting avg.
Our problem is not that we are not able to run it, but that NNAPI performance is not as high as expected and we CPU fallback. Would be grateful for any ideas.

Have you tried to benchmark it with TFlite tool with the different NNAPI parameters It has?

1 Like

We tried it. Here is output.

NNAPI

STARTING!
Log parameter values verbosely: [0]
Graph: [/data/local/tmp/android_segmenter_3ch.tflite]
Enable op profiling: [1]
Use NNAPI: [1]
NNAPI accelerators available: [eden-drv,nnapi-reference]
Use xnnpack: [0]
Loaded model /data/local/tmp/android_segmenter_3ch.tflite
INFO: Initialized TensorFlow Lite runtime.
NNAPI delegate created.
INFO: Created TensorFlow Lite delegate for NNAPI.
Though NNAPI delegate is explicitly applied, the model graph will not be executed by the delegate.
The input model file size (MB): 6.35428
Initialized session in 34.939ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=5 first=130492 curr=111222 min=111222 max=130492 avg=116523 std=7192

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=112256 curr=111087 min=110425 max=114213 avg=112533 std=778

Inference timings in us: Init: 34939, First inference: 130492, Warmup (avg): 116523, Inference (avg): 112533
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=6.70312 overall=63.4023
Profiling Info for Benchmark Initialization:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 0.000 30.388 30.388 99.016% 99.016% 2216.000 1 ModifyGraphWithDelegate/0
AllocateTensors 30.284 0.298 0.151 0.984% 100.000% 0.000 2 AllocateTensors/0

============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 0.000 30.388 30.388 99.016% 99.016% 2216.000 1 ModifyGraphWithDelegate/0
AllocateTensors 30.284 0.298 0.151 0.984% 100.000% 0.000 2 AllocateTensors/0

Number of nodes executed: 2
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
ModifyGraphWithDelegate 1 30.388 99.016% 99.016% 2216.000 1
AllocateTensors 1 0.302 0.984% 100.000% 0.000 2

Timings (microseconds): count=1 curr=30690
Memory (bytes): count=0
2 nodes observed

Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
PAD 0.005 0.150 0.133 0.119% 0.119% 0.000 1 [model/sequential/zero_padding2d/Pad]:0
CONV_2D 0.140 3.266 3.256 2.894% 3.013% 0.000 1 [model/re_lu/Relu;model/batch_normalization/FusedBatchNormV3;model/batch_normalization_1/FusedBatchNormV3;model/depthwise_conv2d/depthwise;model/conv2d_1/Conv2D;model/sequential/conv2d/Conv2D]:1
DEPTHWISE_CONV_2D 3.396 1.990 2.009 1.786% 4.799% 0.000 1 [model/re_lu_1/Relu;model/batch_normalization_1/FusedBatchNormV3;model/depthwise_conv2d/depthwise;model/conv2d_1/Conv2D]:2
CONV_2D 5.406 1.173 1.181 1.050% 5.849% 0.000 1 [model/batch_normalization_2/FusedBatchNormV3;model/conv2d_1/Conv2D1]:3
ADD 6.587 0.422 0.403 0.359% 6.208% 0.000 1 [model/tf.math.add/Add]:4
CONV_2D 6.991 3.037 3.078 2.737% 8.945% 0.000 1 [model/re_lu_2/Relu;model/batch_normalization_3/FusedBatchNormV3;model/batch_normalization_4/FusedBatchNormV3;model/sequential_1/depthwise_conv2d_1/depthwise;model/conv2d_2/Conv2D]:5
PAD 10.070 1.218 1.171 1.041% 9.986% 0.000 1 [model/sequential_1/zero_padding2d_1/Pad]:6
DEPTHWISE_CONV_2D 11.242 2.945 2.920 2.596% 12.582% 0.000 1 [model/re_lu_3/Relu;model/batch_normalization_4/FusedBatchNormV3;model/sequential_1/depthwise_conv2d_1/depthwise]:7
CONV_2D 14.163 1.228 1.212 1.078% 13.660% 0.000 1 [model/batch_normalization_5/FusedBatchNormV3;model/depthwise_conv2d_23/depthwise;model/conv2d_3/Conv2D1]:8
CONV_2D 15.376 1.171 1.173 1.043% 14.703% 0.000 1 [model/re_lu_4/Relu;model/batch_normalization_6/FusedBatchNormV3;model/batch_normalization_10/FusedBatchNormV3;model/sequential_2/depthwise_conv2d_3/depthwise;model/depthwise_conv2d_21/depthwise;model/conv2d_4/Conv2D]:9
DEPTHWISE_CONV_2D 16.549 2.381 2.369 2.106% 16.809% 0.000 1 [model/re_lu_5/Relu;model/batch_normalization_7/FusedBatchNormV3;model/batch_normalization_10/FusedBatchNormV3;model/sequential_2/depthwise_conv2d_3/depthwise;model/depthwise_conv2d_21/depthwise;model/depthwise_conv2d_2/depthwise]:10
CONV_2D 18.919 1.364 1.375 1.223% 18.032% 0.000 1 [model/batch_normalization_8/FusedBatchNormV3;model/depthwise_conv2d_23/depthwise;model/conv2d_5/Conv2D1]:11
ADD 20.295 0.171 0.173 0.154% 18.186% 0.000 1 [model/tf.math.add_1/Add]:12
CONV_2D 20.469 1.146 1.152 1.025% 19.210% 0.000 1 [model/re_lu_6/Relu;model/batch_normalization_9/FusedBatchNormV3;model/batch_normalization_10/FusedBatchNormV3;model/sequential_2/depthwise_conv2d_3/depthwise;model/depthwise_conv2d_21/depthwise;model/conv2d_6/Conv2D]:13
PAD 21.621 0.268 0.275 0.245% 19.455% 0.000 1 [model/sequential_2/zero_padding2d_2/Pad]:14
DEPTHWISE_CONV_2D 21.897 0.571 0.589 0.523% 19.978% 0.000 1 [model/re_lu_7/Relu;model/batch_normalization_10/FusedBatchNormV3;model/sequential_2/depthwise_conv2d_3/depthwise;model/depthwise_conv2d_21/depthwise]:15
CONV_2D 22.486 0.519 0.496 0.441% 20.419% 0.000 1 [model/batch_normalization_11/FusedBatchNormV3;model/depthwise_conv2d_19/depthwise;model/conv2d_7/Conv2D1]:16
CONV_2D 22.982 0.716 0.710 0.631% 21.050% 0.000 1 [model/re_lu_8/Relu;model/batch_normalization_12/FusedBatchNormV3;model/batch_normalization_16/FusedBatchNormV3;model/depthwise_conv2d_5/depthwise;model/conv2d_8/Conv2D]:17
DEPTHWISE_CONV_2D 23.692 0.689 0.803 0.714% 21.764% 0.000 1 [model/re_lu_9/Relu;model/batch_normalization_13/FusedBatchNormV3;model/batch_normalization_16/FusedBatchNormV3;model/depthwise_conv2d_5/depthwise;model/depthwise_conv2d_4/depthwise]:18
CONV_2D 24.496 0.792 0.796 0.707% 22.471% 0.000 1 [model/batch_normalization_14/FusedBatchNormV3;model/depthwise_conv2d_19/depthwise;model/conv2d_9/Conv2D1]:19
ADD 25.292 0.049 0.044 0.039% 22.510% 0.000 1 [model/tf.math.add_2/Add]:20
CONV_2D 25.336 0.705 0.706 0.628% 23.138% 0.000 1 [model/re_lu_10/Relu;model/batch_normalization_15/FusedBatchNormV3;model/batch_normalization_16/FusedBatchNormV3;model/depthwise_conv2d_5/depthwise;model/conv2d_10/Conv2D]:21
DEPTHWISE_CONV_2D 26.042 0.735 0.835 0.742% 23.881% 0.000 1 [model/re_lu_11/Relu;model/batch_normalization_16/FusedBatchNormV3;model/depthwise_conv2d_5/depthwise]:22
CONV_2D 26.878 0.786 0.781 0.694% 24.575% 0.000 1 [model/batch_normalization_17/FusedBatchNormV3;model/depthwise_conv2d_19/depthwise;model/conv2d_11/Conv2D1]:23
ADD 27.659 0.044 0.040 0.036% 24.611% 0.000 1 [model/tf.math.add_3/Add]:24
CONV_2D 27.700 1.350 1.347 1.198% 25.809% 0.000 1 [model/re_lu_12/Relu;model/batch_normalization_18/FusedBatchNormV3;model/batch_normalization_19/FusedBatchNormV3;model/sequential_3/depthwise_conv2d_6/depthwise;model/conv2d_12/Conv2D]:25
PAD 29.048 0.217 0.185 0.165% 25.973% 0.000 1 [model/sequential_3/zero_padding2d_3/Pad]:26
DEPTHWISE_CONV_2D 29.233 0.492 0.467 0.415% 26.389% 0.000 1 [model/re_lu_13/Relu;model/batch_normalization_19/FusedBatchNormV3;model/sequential_3/depthwise_conv2d_6/depthwise]:27
CONV_2D 29.701 0.709 0.707 0.628% 27.017% 0.000 1 [model/batch_normalization_20/FusedBatchNormV3;model/conv2d_19/Conv2D;model/conv2d_13/Conv2D1]:28
CONV_2D 30.408 0.581 0.553 0.492% 27.509% 0.000 1 [model/re_lu_14/Relu;model/batch_normalization_21/FusedBatchNormV3;model/batch_normalization_22/FusedBatchNormV3;model/depthwise_conv2d_7/depthwise;model/conv2d_14/Conv2D]:29
DEPTHWISE_CONV_2D 30.962 0.303 0.330 0.294% 27.803% 0.000 1 [model/re_lu_15/Relu;model/batch_normalization_22/FusedBatchNormV3;model/depthwise_conv2d_7/depthwise]:30
CONV_2D 31.292 0.610 0.602 0.535% 28.338% 0.000 1 [model/batch_normalization_23/FusedBatchNormV3;model/conv2d_19/Conv2D;model/conv2d_15/Conv2D1]:31
ADD 31.894 0.022 0.022 0.020% 28.358% 0.000 1 [model/tf.math.add_4/Add]:32
CONV_2D 31.917 0.509 0.514 0.457% 28.815% 0.000 1 [model/re_lu_16/Relu;model/batch_normalization_24/FusedBatchNormV3;model/batch_normalization_28/FusedBatchNormV3;model/depthwise_conv2d_9/depthwise;model/conv2d_16/Conv2D]:33
DEPTHWISE_CONV_2D 32.431 0.285 0.282 0.251% 29.065% 0.000 1 [model/re_lu_17/Relu;model/batch_normalization_25/FusedBatchNormV3;model/batch_normalization_28/FusedBatchNormV3;model/depthwise_conv2d_9/depthwise;model/depthwise_conv2d_8/depthwise]:34
CONV_2D 32.713 0.538 0.534 0.474% 29.540% 0.000 1 [model/batch_normalization_26/FusedBatchNormV3;model/conv2d_19/Conv2D;model/conv2d_17/Conv2D1]:35
ADD 33.247 0.021 0.022 0.020% 29.559% 0.000 1 [model/tf.math.add_5/Add]:36
CONV_2D 33.270 0.526 0.521 0.463% 30.022% 0.000 1 [model/re_lu_18/Relu;model/batch_normalization_27/FusedBatchNormV3;model/batch_normalization_28/FusedBatchNormV3;model/depthwise_conv2d_9/depthwise;model/conv2d_18/Conv2D]:37
DEPTHWISE_CONV_2D 33.791 0.294 0.280 0.249% 30.271% 0.000 1 [model/re_lu_19/Relu;model/batch_normalization_28/FusedBatchNormV3;model/depthwise_conv2d_9/depthwise]:38
CONV_2D 34.071 0.546 0.537 0.478% 30.749% 0.000 1 [model/batch_normalization_29/FusedBatchNormV3;model/conv2d_19/Conv2D1]:39
ADD 34.608 0.023 0.023 0.021% 30.769% 0.000 1 [model/tf.math.add_6/Add]:40
CONV_2D 34.632 1.284 1.270 1.129% 31.898% 0.000 1 [model/re_lu_20/Relu;model/batch_normalization_30/FusedBatchNormV3;model/batch_normalization_31/FusedBatchNormV3;model/depthwise_conv2d_10/depthwise;model/conv2d_20/Conv2D]:41
DEPTHWISE_CONV_2D 35.902 0.807 0.778 0.692% 32.590% 0.000 1 [model/re_lu_21/Relu;model/batch_normalization_31/FusedBatchNormV3;model/depthwise_conv2d_10/depthwise]:42
CONV_2D 36.681 1.888 2.001 1.779% 34.369% 0.000 1 [model/batch_normalization_32/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_21/Conv2D1]:43
CONV_2D 38.682 2.474 2.469 2.195% 36.564% 0.000 1 [model/re_lu_22/Relu;model/batch_normalization_33/FusedBatchNormV3;model/batch_normalization_37/FusedBatchNormV3;model/sequential_4/depthwise_conv2d_12/depthwise;model/conv2d_22/Conv2D]:44
DEPTHWISE_CONV_2D 41.152 1.040 1.117 0.993% 37.557% 0.000 1 [model/re_lu_23/Relu;model/batch_normalization_34/FusedBatchNormV3;model/batch_normalization_37/FusedBatchNormV3;model/sequential_4/depthwise_conv2d_12/depthwise;model/depthwise_conv2d_11/depthwise]:45
CONV_2D 42.269 2.785 2.827 2.513% 40.070% 0.000 1 [model/batch_normalization_35/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_23/Conv2D1]:46
ADD 45.096 0.044 0.042 0.037% 40.107% 0.000 1 [model/tf.math.add_7/Add]:47
CONV_2D 45.138 2.477 2.474 2.200% 42.307% 0.000 1 [model/re_lu_24/Relu;model/batch_normalization_36/FusedBatchNormV3;model/batch_normalization_37/FusedBatchNormV3;model/sequential_4/depthwise_conv2d_12/depthwise;model/conv2d_24/Conv2D]:48
PAD 47.612 0.204 0.184 0.164% 42.470% 0.000 1 [model/sequential_4/zero_padding2d_4/Pad]:49
DEPTHWISE_CONV_2D 47.797 0.368 0.353 0.314% 42.784% 0.000 1 [model/re_lu_25/Relu;model/batch_normalization_37/FusedBatchNormV3;model/sequential_4/depthwise_conv2d_12/depthwise]:50
CONV_2D 48.150 1.051 1.067 0.948% 43.732% 0.000 1 [model/batch_normalization_38/FusedBatchNormV3;model/conv2d_29/Conv2D;model/conv2d_25/Conv2D1]:51
CONV_2D 49.217 1.328 1.337 1.188% 44.921% 0.000 1 [model/re_lu_26/Relu;model/batch_normalization_39/FusedBatchNormV3;model/batch_normalization_43/FusedBatchNormV3;model/depthwise_conv2d_14/depthwise;model/depthwise_conv2d_15/depthwise;model/conv2d_26/Conv2D]:52
DEPTHWISE_CONV_2D 50.554 0.384 0.372 0.331% 45.252% 0.000 1 [model/re_lu_27/Relu;model/batch_normalization_40/FusedBatchNormV3;model/batch_normalization_43/FusedBatchNormV3;model/depthwise_conv2d_14/depthwise;model/depthwise_conv2d_15/depthwise;model/depthwise_conv2d_13/depthwise]:53
CONV_2D 50.927 1.452 1.465 1.303% 46.554% 0.000 1 [model/batch_normalization_41/FusedBatchNormV3;model/conv2d_29/Conv2D;model/conv2d_27/Conv2D1]:54
ADD 52.393 0.012 0.012 0.011% 46.566% 0.000 1 [model/tf.math.add_8/Add]:55
CONV_2D 52.406 1.323 1.331 1.184% 47.749% 0.000 1 [model/re_lu_28/Relu;model/batch_normalization_42/FusedBatchNormV3;model/batch_normalization_43/FusedBatchNormV3;model/depthwise_conv2d_14/depthwise;model/depthwise_conv2d_15/depthwise;model/conv2d_28/Conv2D]:56
DEPTHWISE_CONV_2D 53.737 0.361 0.353 0.314% 48.063% 0.000 1 [model/re_lu_29/Relu;model/batch_normalization_43/FusedBatchNormV3;model/depthwise_conv2d_14/depthwise;model/depthwise_conv2d_15/depthwise]:57
CONV_2D 54.091 1.415 1.429 1.270% 49.333% 0.000 1 [model/batch_normalization_44/FusedBatchNormV3;model/conv2d_29/Conv2D1]:58
ADD 55.520 0.012 0.011 0.010% 49.343% 0.000 1 [model/tf.math.add_9/Add]:59
CONV_2D 55.531 1.328 1.328 1.181% 50.524% 0.000 1 [model/re_lu_30/Relu;model/batch_normalization_45/FusedBatchNormV3;model/batch_normalization_43/FusedBatchNormV3;model/depthwise_conv2d_14/depthwise;model/depthwise_conv2d_15/depthwise;model/conv2d_30/Conv2D]:60
DEPTHWISE_CONV_2D 56.860 0.357 0.356 0.317% 50.840% 0.000 1 [model/depthwise_conv2d_15/depthwise1]:61
CONV_2D 57.216 1.030 1.011 0.899% 51.740% 0.000 1 [model/re_lu_31/Relu;model/batch_normalization_46/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_31/Conv2D]:62
RESIZE_NEAREST_NEIGHBOR 58.228 0.019 0.018 0.016% 51.755% 0.000 1 [model/up_sampling2d/resize/ResizeNearestNeighbor]:63
DEPTHWISE_CONV_2D 58.246 0.191 0.191 0.170% 51.925% 0.000 1 [model/depthwise_conv2d_16/depthwise1]:64
CONV_2D 58.438 0.472 0.458 0.407% 52.333% 0.000 1 [model/re_lu_32/Relu;model/batch_normalization_47/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_32/Conv2D]:65
DEPTHWISE_CONV_2D 58.896 0.177 0.191 0.170% 52.503% 0.000 1 [model/depthwise_conv2d_17/depthwise1]:66
CONV_2D 59.087 0.439 0.454 0.403% 52.906% 0.000 1 [model/re_lu_33/Relu;model/batch_normalization_48/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_33/Conv2D]:67
ADD 59.541 0.031 0.030 0.027% 52.933% 0.000 1 [model/tf.math.add_10/Add]:68
RESIZE_NEAREST_NEIGHBOR 59.572 0.061 0.061 0.055% 52.988% 0.000 1 [model/up_sampling2d_1/resize/ResizeNearestNeighbor]:69
DEPTHWISE_CONV_2D 59.633 0.797 0.785 0.698% 53.686% 0.000 1 [model/depthwise_conv2d_18/depthwise1]:70
CONV_2D 60.419 1.807 1.794 1.595% 55.281% 0.000 1 [model/re_lu_34/Relu;model/batch_normalization_49/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_34/Conv2D]:71
DEPTHWISE_CONV_2D 62.213 0.368 0.355 0.316% 55.596% 0.000 1 [model/depthwise_conv2d_19/depthwise2]:72
CONV_2D 62.569 0.665 0.653 0.580% 56.177% 0.000 1 [model/re_lu_35/Relu;model/batch_normalization_50/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_35/Conv2D]:73
ADD 63.222 0.110 0.109 0.097% 56.274% 0.000 1 [model/tf.math.add_11/Add]:74
DEPTHWISE_CONV_2D 63.332 0.673 0.788 0.701% 56.975% 0.000 1 [model/depthwise_conv2d_20/depthwise2]:75
CONV_2D 64.120 1.168 1.169 1.039% 58.014% 0.000 1 [model/re_lu_36/Relu;model/batch_normalization_51/FusedBatchNormV3;model/batch_normalization_10/FusedBatchNormV3;model/sequential_2/depthwise_conv2d_3/depthwise;model/depthwise_conv2d_21/depthwise;model/conv2d_36/Conv2D]:76
RESIZE_NEAREST_NEIGHBOR 65.290 0.335 0.318 0.283% 58.297% 0.000 1 [model/up_sampling2d_2/resize/ResizeNearestNeighbor]:77
DEPTHWISE_CONV_2D 65.609 2.184 2.158 1.919% 60.216% 0.000 1 [model/depthwise_conv2d_21/depthwise1]:78
CONV_2D 67.767 2.271 2.284 2.031% 62.247% 0.000 1 [model/re_lu_37/Relu;model/batch_normalization_52/FusedBatchNormV3;model/depthwise_conv2d_22/depthwise;model/conv2d_37/Conv2D]:79
RESIZE_NEAREST_NEIGHBOR 70.052 0.735 0.738 0.656% 62.903% 0.000 1 [model/up_sampling2d_3/resize/ResizeNearestNeighbor]:80
DEPTHWISE_CONV_2D 70.791 6.391 6.568 5.840% 68.742% 0.000 1 [model/depthwise_conv2d_22/depthwise2]:81
CONV_2D 77.361 3.830 3.837 3.412% 72.154% 0.000 1 [model/re_lu_38/Relu;model/batch_normalization_53/FusedBatchNormV3;model/depthwise_conv2d_23/depthwise;model/conv2d_38/Conv2D]:82
RESIZE_NEAREST_NEIGHBOR 81.199 1.524 1.534 1.364% 73.518% 0.000 1 [model/up_sampling2d_4/resize/ResizeNearestNeighbor]:83
DEPTHWISE_CONV_2D 82.734 17.310 17.460 15.523% 89.041% 0.000 1 [model/depthwise_conv2d_23/depthwise2]:84
CONV_2D 100.196 5.083 5.117 4.549% 93.590% 0.000 1 [model/conv2d_39/BiasAdd;model/conv2d_39/Conv2D;conv2d_39/bias1]:85
SOFTMAX 105.314 7.563 7.210 6.410% 100.000% 0.000 1 [StatefulPartitionedCall:0]:86

============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DEPTHWISE_CONV_2D 82.734 17.310 17.460 15.523% 15.523% 0.000 1 [model/depthwise_conv2d_23/depthwise2]:84
SOFTMAX 105.314 7.563 7.210 6.410% 21.933% 0.000 1 [StatefulPartitionedCall:0]:86
DEPTHWISE_CONV_2D 70.791 6.391 6.568 5.840% 27.772% 0.000 1 [model/depthwise_conv2d_22/depthwise2]:81
CONV_2D 100.196 5.083 5.117 4.549% 32.322% 0.000 1 [model/conv2d_39/BiasAdd;model/conv2d_39/Conv2D;conv2d_39/bias1]:85
CONV_2D 77.361 3.830 3.837 3.412% 35.733% 0.000 1 [model/re_lu_38/Relu;model/batch_normalization_53/FusedBatchNormV3;model/depthwise_conv2d_23/depthwise;model/conv2d_38/Conv2D]:82
CONV_2D 0.140 3.266 3.256 2.894% 38.628% 0.000 1 [model/re_lu/Relu;model/batch_normalization/FusedBatchNormV3;model/batch_normalization_1/FusedBatchNormV3;model/depthwise_conv2d/depthwise;model/conv2d_1/Conv2D;model/sequential/conv2d/Conv2D]:1
CONV_2D 6.991 3.037 3.078 2.737% 41.364% 0.000 1 [model/re_lu_2/Relu;model/batch_normalization_3/FusedBatchNormV3;model/batch_normalization_4/FusedBatchNormV3;model/sequential_1/depthwise_conv2d_1/depthwise;model/conv2d_2/Conv2D]:5
DEPTHWISE_CONV_2D 11.242 2.945 2.920 2.596% 43.961% 0.000 1 [model/re_lu_3/Relu;model/batch_normalization_4/FusedBatchNormV3;model/sequential_1/depthwise_conv2d_1/depthwise]:7
CONV_2D 42.269 2.785 2.827 2.513% 46.474% 0.000 1 [model/batch_normalization_35/FusedBatchNormV3;model/depthwise_conv2d_20/depthwise;model/conv2d_23/Conv2D1]:46
CONV_2D 45.138 2.477 2.474 2.200% 48.673% 0.000 1 [model/re_lu_24/Relu;model/batch_normalization_36/FusedBatchNormV3;model/batch_normalization_37/FusedBatchNormV3;model/sequential_4/depthwise_conv2d_12/depthwise;model/conv2d_24/Conv2D]:48

Number of nodes executed: 87
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
CONV_2D 40 56.985 50.682% 50.682% 0.000 40
DEPTHWISE_CONV_2D 24 42.700 37.977% 88.659% 0.000 24
SOFTMAX 1 7.209 6.412% 95.070% 0.000 1
RESIZE_NEAREST_NEIGHBOR 5 2.667 2.372% 97.442% 0.000 5
PAD 5 1.948 1.733% 99.175% 0.000 5
ADD 12 0.928 0.825% 100.000% 0.000 12

Timings (microseconds): count=50 first=112190 curr=111029 min=110364 max=114134 avg=112478 std=778
Memory (bytes): count=0
87 nodes observed

GPU

GPU

STARTING!
Log parameter values verbosely: [0]
Graph: [/data/local/tmp/android_segmenter_3ch.tflite]
Enable op profiling: [1]
Use gpu: [1]
Loaded model /data/local/tmp/android_segmenter_3ch.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate created.
INFO: Replacing 87 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions.
INFO: Initialized OpenCL-based API.
Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 6.35428
Initialized session in 493.147ms.
INFO: Created 1 GPU delegate kernels.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=31 first=49690 curr=16335 min=12114 max=49690 avg=15935.1 std=6269

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=59 first=16829 curr=16825 min=15340 max=18993 avg=16181.1 std=539

Inference timings in us: Init: 493147, First inference: 49690, Warmup (avg): 15935.1, Inference (avg): 16181.1
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=105.855 overall=114.68
Profiling Info for Benchmark Initialization:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 0.000 488.305 488.305 99.987% 99.987% 104548.000 1 ModifyGraphWithDelegate/0
AllocateTensors 488.283 0.061 0.031 0.013% 100.000% 0.000 2 AllocateTensors/0

============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 0.000 488.305 488.305 99.987% 99.987% 104548.000 1 ModifyGraphWithDelegate/0
AllocateTensors 488.283 0.061 0.031 0.013% 100.000% 0.000 2 AllocateTensors/0

Number of nodes executed: 2
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
ModifyGraphWithDelegate 1 488.305 99.987% 99.987% 104548.000 1
AllocateTensors 1 0.062 0.013% 100.000% 0.000 2

Timings (microseconds): count=1 curr=488367
Memory (bytes): count=0
2 nodes observed

Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
TfLiteGpuDelegateV2 0.032 16.734 16.100 100.000% 100.000% 0.000 1 [StatefulPartitionedCall:0]:87

============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
TfLiteGpuDelegateV2 0.032 16.734 16.100 100.000% 100.000% 0.000 1 [StatefulPartitionedCall:0]:87

Number of nodes executed: 1
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
TfLiteGpuDelegateV2 1 16.100 100.000% 100.000% 0.000 1

Timings (microseconds): count=59 first=16734 curr=16749 min=15261 max=18911 avg=16100.1 std=540
Memory (bytes): count=0
1 nodes observed

I don’t know if this fix was also valid for NNAPI or it is still a false positive warning.

As far as i understood this is related to custom written delegate, but we are trying to launch default one.
Seems that with adding OpenCL and using alowStageLoss flag we have NNAPI working on Pixel 6 and float 16 architecture.

However still not on all devices and architectures.
Any suggestion anyone?

Have you tried to enable setUseNnapiCpu?

1 Like

Also you could debug in the log the available ops for the specific device:

1 Like

Guys sorry, most of our team is in Ukraine, Kharkiv and we have another priorities right now. We will try to check it a bit later when we would be able.

We ended up creating profile table (different inputs size and architecture) for different devices with hard coded GPU delegate.