Hi, thank you for taking the time to answer.
I am making a plugin for a new type of hardware accelerator.
Currently I am not talking directly to the accelerator but instead the kernels send instructions to a simulation via RPC - there I can trace anything happening.
What I am observing running a SavedModel with the plugin, I get a meaningful trace. Here is the script:
import tensorflow as tf
shape = (1, 32, 32, 3)
model = keras.models.load_model('./model_zoo/retinanet_resnet50_v1_fpn_640x640_1')
x = tf.random.uniform(shape)
y = model(x)
The trace output from the RPC client looks like:
(...)
Send op Relu ( 3) Inputs: mem05eb Outputs: mem05ee
Send op Conv2D ( 4) Inputs: mem05ee mem05ef Outputs: mem05f0
Send op BiasAdd ( 4) Inputs: mem05f0 mem05f1 Outputs: mem05f2
Send op Conv2D ( 5) Inputs: mem05d6 mem05f3 Outputs: mem05f4
Send op BiasAdd ( 5) Inputs: mem05f4 mem05f5 Outputs: mem05f6
Send op FusedBat ( 4) Inputs: mem05f6 mem05f7 mem05f8 mem05f9 mem05fa Outputs: mem05fb mem05fc mem05fd
Send op FusedBat ( 5) Inputs: mem05f2 mem05fe mem05ff mem0600 mem0601 Outputs: mem0602 mem0603 mem0604
Send op AddV2 ( 55) Inputs: mem05fb mem0602 Outputs: mem0605
(...)
I see a bunch of Fill
and other operations but I pass the details.
When running a hub model like in this script:
import tensorflow as tf
import tensorflow_hub as hub
model = hub.load('https://tfhub.dev/tensorflow/retinanet/resnet50_v1_fpn_640x640/1')
shape = (1, 32, 32, 3)
input = tf.random.normal(shape)
y = model(input)
I see no Conv2D
…
It starts with a bunch of Identity
, AssignVariableOp
, then one Mul
and one AddV2
and that’s it.
(...)
Send op Identity ( 460) Inputs: mem03a6 Outputs: mem03a7
(...)
RandomStandardNormal in device /job:localhost/replica:0/task:0/device:CPU:0
Send op Mul ( 1) Inputs: mem03b0 mem03af Outputs: mem03b1
Send op AddV2 ( 1) Inputs: mem03b1 mem03ae Outputs: mem03b2
The kernels registered in the plugin sends instructions directly to the RPC client.
What I would expect to see from the second test is the same trace (seeing the Conv2D
, BiasAdd
, etc) as the first one.
I might just be using the hub api wrong.
Side not about the environment - I am runing everything in an ARM docker container (I developping on apple silicon), the image is the latest one from https://hub.docker.com/r/armswdev/tensorflow-arm-neoverse, it is using tensorflow 2.8.
It shouldn’t be the issue though.