The usual practice to use a Vision Transformer model on an image having a different resolution than the training one is as follows. Say inferring on 480x480 images as opposed to 224x224 (training resolution).
The learned positional (or sin/cosine or relative positional bias) embeddings are interpolated to match the target resolution. While it’s trivial to implement this in code but in TensorFlow, it seems like a non-trivial one particularly when one tries to serialize the model with model.save()
.
The following notebook implements a Vision Transformer model including all the vanilla recipes introduced in [1] such as learned positional embedding, the use of class token, etc. It also has a function to interpolate the positional embeddings when the input resolution is different from the training one.
I’ve even tried to decorate the call()
method with tf.function
(see below) but it doesn’t help either.
@tf.function(
input_signature=[tf.TensorSpec([None, None, None, 3], tf.float32)]
)
Any workarounds?
Cc @ariG23498
Model saving is the issue here (and not how to implement the interpolation part in TensorFlow).
Bhack
May 10, 2022, 11:14am
5
Ok, you have not explained that the problem was related to saving/serialization in the description (it was only in the colab).
Also, if you see, their implementation save the model. I suppose without problem, but I have not tested it personally.
I’ve not seen your repo impl in detail but I think that you are in the same case as:
opened 06:15PM - 27 Apr 22 UTC
closed 02:19PM - 13 Jan 23 UTC
type:bug/performance
stat:awaiting response from contributor
stale
I run into the above warning when saving sub-classed models with `model.save()`.…
After some investigation, the error seems to occur when using sub-classed layers in combination with sub-classed models.
See the code below for a minimal example:
```
import tensorflow as tf
class CustomLayer(tf.keras.layers.Layer):
def __init__(self, units=2):
super(CustomLayer, self).__init__()
self.units = units
self.layer = tf.keras.layers.Dense(units)
def get_config(self):
config = super(CustomLayer, self).get_config()
config['units'] = self.units
return config
def call(self, x, **kwargs):
x = self.layer(x)
return x
class CustomModel(tf.keras.Model):
def __init__(self, hidden_units):
super(CustomModel, self).__init__()
self.hidden_units = hidden_units
self.dense_layers = [CustomLayer(u) for u in hidden_units]
def call(self, inputs):
x = inputs
for layer in self.dense_layers:
x = layer(x)
return x
def get_config(self):
return {"hidden_units": self.hidden_units}
@classmethod
def from_config(cls, config):
return cls(**config)
model = CustomModel([16, 16, 10])
print(model(tf.random.uniform((1, 5))).shape)
model.compile()
model.save("my_model", )
print('---')
loaded_model = tf.keras.models.load_model('my_model')
```
Or check out this [gist](https://colab.research.google.com/gist/mhorlacher/e172e2dc8ec10fc790cf1c3fa1931e8a/untitled8.ipynb).
My TF version is `2.8.0`.
That’s a warning and I am aware of it.
Bhack:
Also, if you see, their implementation save the model. I suppose without problem, but I have not tested it personally.
Which one are you referring to?
Yup, sorry. Just edited the post description.
Bhack
May 10, 2022, 11:55am
9
The same repo. There are some save and interpolate tests to check:
import tempfile
import numpy as np
import pytest
import tensorflow as tf
from tfimm.models.factory import create_model, create_preprocessing, transfer_weights
from .architectures import TEST_ARCHITECTURES # noqa: F401
# Models for which we cannot change the input size during model creation. Examples
# are some MLP models, where the number of patches becomes the number of filters
# for convolutional kernels.
FIXED_SIZE_MODELS_CREATION = [
"mixer_test_model", # mlp_mixer.py
"resmlp_test_model",
"gmlp_test_model",
]
# Models for which we cannot change the input size during inference.
FIXED_SIZE_MODELS_INFERENCE = [
This file has been truncated. show original
Bhack
May 10, 2022, 12:12pm
10
What is your problem exactly?
I don’t see too much things in the colab other then
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.
As your model was not compiled why you don’t load with:
vit_dino_base_loaded = tf.keras.models.load_model("vit_dino_base", compile=False)
If you run the Colab with the decorator enabled, the model instantiation itself won’t run.
Without that, things work okay but here’s the problem. After the model is instantiated and is called on some inputs, the model sets its input shapes and is therefore unable to operate on inputs having different resolutions.
Sorry for not making it clear.
Bhack
May 10, 2022, 2:00pm
13
The input shape need to be defined for build/save so you cannot change the input as is after loading in your case.
You need to use a sort of transfer weights approach:
def test_change_input_size_inference(model_name):
"""
We test if we can run inference with different input sizes.
"""
model = create_model(model_name)
# For transformer models we need specify the `interpolate_input` parameter. Models
# that don't have the parameter will ignore it.
flexible_model = create_model(model_name, interpolate_input=True)
transfer_weights(model, flexible_model)
# First we test if setting `interpolate_input=True` doesn't change the output for
# original input size.
rng = np.random.default_rng(2021)
img = rng.random(
size=(1, *model.cfg.input_size, model.cfg.in_channels), dtype="float32"
)
res_1 = model(img)
res_2 = flexible_model(img)
assert (np.max(np.abs(res_1 - res_2))) / (np.max(np.abs(res_1)) + 1e-6) < 1e-6
This file has been truncated. show original
See also:
Transformer Models
==================
Here we describe some features that are common to most transformer models.
Changing input shape
--------------------
Most parts of a transformer architecture are independent of input resolution. Changing
the input resolution results in a different number of patches. The projection,
self-attention and MLP layers work on arbitrary length inputs. The only part that needs
to be adapted are the position embeddings.
Position embeddings can adjusted via 2D interpolation to the new input resolution.
However, since position embeddings are learnt, after interpolation they may no longer
be meaningful. Thus, by default, transformer models can only run inference at the
resolution specified by ``input_size``.
If we want to fine-tune a model at a different resolution, we can specify the new
resolution when creating the model. In that case, the position embeddings will be
This file has been truncated. show original
Thanks for the pointers. Appreciate your help.
Cc: @ariG23498