What is needed to run a hub model?

Hello,

Currently writing a plugin for a pluggable device, I am able to run a Resnet50 with my plugin and I can trace the execution of each operation in my plugin.
However, when loading a Resnet50 based model from the hub and running it, I don’t see anything being executed in my plugin.
Based on this answer, running a tensorflow hub model should be similar to running a local SavedModel, which actually works for my plugin without any extra work.

So my questions is: what do I need to implement in order for my plugin to work with hub models?

And if there are resources available on the subject (hub models, key differences with SavedModel, hub models for plugin developpers), I would be very grateful to anyone sharing them.

Sorry but I am not sure to understand you, what do you mean?

I conflicted with a general kind of issue last time.

Does anyone have an idea why that is?

Looping in @lgusm :+1: :+1:

1 Like

A model from TensorFlow Hub is exactly the same as any TensorFlow model, there’s no difference
Which format are you trying to use? savedmodel, tflite, tfjs?

Where is the model running? what kind of plugin are you building?

Hi, thank you for taking the time to answer.

I am making a plugin for a new type of hardware accelerator.
Currently I am not talking directly to the accelerator but instead the kernels send instructions to a simulation via RPC - there I can trace anything happening.

What I am observing running a SavedModel with the plugin, I get a meaningful trace. Here is the script:

import tensorflow as tf
shape = (1, 32, 32, 3)
model = keras.models.load_model('./model_zoo/retinanet_resnet50_v1_fpn_640x640_1')
x = tf.random.uniform(shape)
y = model(x)

The trace output from the RPC client looks like:

(...)
Send op     Relu (   3)  Inputs: mem05eb  Outputs: mem05ee
Send op   Conv2D (   4)  Inputs: mem05ee mem05ef  Outputs: mem05f0
Send op  BiasAdd (   4)  Inputs: mem05f0 mem05f1  Outputs: mem05f2
Send op   Conv2D (   5)  Inputs: mem05d6 mem05f3  Outputs: mem05f4
Send op  BiasAdd (   5)  Inputs: mem05f4 mem05f5  Outputs: mem05f6
Send op FusedBat (   4)  Inputs: mem05f6 mem05f7 mem05f8 mem05f9 mem05fa  Outputs: mem05fb mem05fc mem05fd
Send op FusedBat (   5)  Inputs: mem05f2 mem05fe mem05ff mem0600 mem0601  Outputs: mem0602 mem0603 mem0604
Send op    AddV2 (  55)  Inputs: mem05fb mem0602  Outputs: mem0605
(...)

I see a bunch of Fill and other operations but I pass the details.

When running a hub model like in this script:

import tensorflow as tf
import tensorflow_hub as hub

model = hub.load('https://tfhub.dev/tensorflow/retinanet/resnet50_v1_fpn_640x640/1')
shape = (1, 32, 32, 3)
input = tf.random.normal(shape)
y = model(input)

I see no Conv2D
It starts with a bunch of Identity, AssignVariableOp, then one Mul and one AddV2 and that’s it.

(...)
Send op Identity ( 460)  Inputs: mem03a6  Outputs: mem03a7
(...)
RandomStandardNormal in device /job:localhost/replica:0/task:0/device:CPU:0
Send op      Mul (   1)  Inputs: mem03b0 mem03af  Outputs: mem03b1
Send op    AddV2 (   1)  Inputs: mem03b1 mem03ae  Outputs: mem03b2

The kernels registered in the plugin sends instructions directly to the RPC client.

What I would expect to see from the second test is the same trace (seeing the Conv2D, BiasAdd, etc) as the first one.

I might just be using the hub api wrong.

Side not about the environment - I am runing everything in an ARM docker container (I developping on apple silicon), the image is the latest one from https://hub.docker.com/r/armswdev/tensorflow-arm-neoverse, it is using tensorflow 2.8.
It shouldn’t be the issue though.

so the problem is different from what I understood

the model works, it’s just showing you different operations, is that it?

Model from TFHub are on the saved_model format and that’s the raw closest to the operations used. Loading a Keras model will load and show the layers used (if the model was built using Keras and that’s not always the case)

I imagine that both models will give the exact same results (they are in fact the same models) it’s just what the trace is showing

if the model on hub was a Keras model, you can load using the Keras method by:

  • hub.load the model
  • hub.resolve method on the same url. This will give you the path where the model is saved locally.
  • load the model on this path using the keras.model.load_model method you were using

but it won’t always work. If the model wasn’t created using a Keras API, it will give you some warnings.

some more info here:

The trace is showing up what tensorflow is calling from the plugin.

The problem is that I don’t see the same kernels being called depending on whether I load a tensorflow hub model from the hub api or when I load the exact same model (from tensorflow hub) but using manual download + tf.keras.models.load_model.
While there could be some minor differences, the real problemis that there is actually no Conv2D, no BiasAdd (anything relevant as part of the nn structure) being called in the plugin by tensorflow when using the hub api.

So my question is what am I missing?
Why doesn’t tensorflow call Conv2D (and others) kernels when I use the hub api but it calls them when I manually download the model from the hub and use the keras/saved model api?

I saw in tutorials that the user should load the model/layer with model = hub.KerasLayer(url) as in this example. So I ran another test using that api from the example and I am facing the same issue (weird trace, tensorflow doesn’t call any Conv2D kernel from the plugin):

model = tf.keras.Sequential([tf.keras.Input(shape), hub.KerasLayer(url)])

My goal really is for my plugin to support tensorflow hub models seamlessly, and while I understand that tensorflow hub models are just SavedModel wrappers, I don’t get the expected behaviour when using the hub api as a normal user would.

I think the reason is that a Conv2D operation is in fact many other smaller operations aggregated.
When you see the tfhub model trace, you’re seeing the smaller ops instead of the aggregated one (when loading a keras model)

One way to validate this would be to see the Conv2D implementation and see what core ops it’s using.

does it makes sense?

Yes it makes sense, however I am highly confident that this is not the case - based on the knowledge I gathered working on this plugin, but also because the only mathematical kernels I see being called are one single AddV2 and one single Mul.
Also I don’t see any activation layers like Relu (or Relu6, I can’t remember which one is being used), those could be said to be mathematically more elementary than Conv2D.
And again, in my experiment I am running the exact same SavedModel (pulled from the same url), the difference is that one has been downaloaded manually and loaded with the savedmodel api and the other one straight with the tensorflow hub api.

So I believe these points are enough to prove that it can’t be the case.

I can’t test the tracing to replicate but one thing you can try is use

model = tf.saved_model.load(path_to_dir)

on the model you downloaded outside of hub and see it’s tracing is the same you get when loading from hub
That’s very similar to what the hub library does

another way of validating if the operations are the same, is trying both models on the same input and comparing the results, they should be the same no matter how you load them

I can’t test the tracing to replicate but one thing you can try is use
model = tf.saved_model.load(path_to_dir)
on the model you downloaded outside of hub and see it’s tracing is the same you get when loading from hub
That’s very similar to what the hub library does

This is exactly what I explained I tried already.
Let me say it again:
When downloading the model manually from the hub and loading it using

model = tf.saved_model.load(path)

I see Conv2D and other relevant layers being executed by my plugin.
But when I load the same model (from the same hub url) using the tf hub api:

model = hub.load(url)

I don’t get any relevant layer being executed by the plugin - this is what I am trying to figure out.

another way of validating if the operations are the same, is trying both models on the same input and comparing the results, they should be the same no matter how you load them

I’ve tested it and they both are the same:

input_shape = (args.batch_size, *hubmodel.input_shape)
input = tf.cast(
    tf.experimental.numpy.random.randint(255, size=input_shape),
    tf.uint8
)
model = hub.load(hubmodel.url)
y1 = model(input)

model = tf.saved_model.load('./model_zoo/retinanet_resnet50_v1_fpn_640x640_1')
y2 = model(input)

import numpy as np
print(all(np.array_equal(val.numpy(), y2[key].numpy()) for key, val in y1.items()))
# Outputs True
1 Like

Since we can’t reproduce the behaviour easily, and I think it’s clear that the models are the same, the best thing that can help you is looking into the load method from TFHub:

sorry for the no-answer answer