Recommended way to interpolate positional embeddings in ViT in TensorFlow

Sayak_Paul · May 10, 2022, 7:08am

The usual practice to use a Vision Transformer model on an image having a different resolution than the training one is as follows. Say inferring on 480x480 images as opposed to 224x224 (training resolution).

The learned positional (or sin/cosine or relative positional bias) embeddings are interpolated to match the target resolution. While it’s trivial to implement this in code but in TensorFlow, it seems like a non-trivial one particularly when one tries to serialize the model with model.save().

The following notebook implements a Vision Transformer model including all the vanilla recipes introduced in [1] such as learned positional embedding, the use of class token, etc. It also has a function to interpolate the positional embeddings when the input resolution is different from the training one.

I’ve even tried to decorate the call() method with tf.function (see below) but it doesn’t help either.

@tf.function(
     input_signature=[tf.TensorSpec([None, None, None, 3], tf.float32)]
 )

Any workarounds?

Cc @ariG23498

Bhack · May 10, 2022, 9:39am

have you already seen:

github.com

martinsbruveris/tensorflow-image-models/blob/main/tfimm/layers/transformers.py

"""
Common layers shared between transformer architectures.

Copyright 2021 Martins Bruveris
"""
from typing import Optional, Tuple

import tensorflow as tf

from tfimm.layers.factory import act_layer_factory, norm_layer_factory


def interpolate_pos_embeddings(
    pos_embed: tf.Tensor,
    src_grid_size: Tuple[int, int],
    tgt_grid_size: Tuple[int, int],
    nb_tokens: int = 0,
) -> tf.Tensor:
    """
    This method allows to interpolate the pre-trained position encodings, to be

This file has been truncated. show original

Sayak_Paul · May 10, 2022, 9:50am

Model saving is the issue here (and not how to implement the interpolation part in TensorFlow).

Bhack · May 10, 2022, 11:14am

Ok, you have not explained that the problem was related to saving/serialization in the description (it was only in the colab).
Also, if you see, their implementation save the model. I suppose without problem, but I have not tested it personally.

I’ve not seen your repo impl in detail but I think that you are in the same case as:

github.com/keras-team/keras

WARNING:absl:Found untraced functions such as [...]

opened 06:15PM - 27 Apr 22 UTC

closed 02:19PM - 13 Jan 23 UTC

mhorlacher

type:bug/performance stat:awaiting response from contributor stale

I run into the above warning when saving sub-classed models with `model.save()`.… After some investigation, the error seems to occur when using sub-classed layers in combination with sub-classed models. See the code below for a minimal example: ``` import tensorflow as tf class CustomLayer(tf.keras.layers.Layer): def __init__(self, units=2): super(CustomLayer, self).__init__() self.units = units self.layer = tf.keras.layers.Dense(units) def get_config(self): config = super(CustomLayer, self).get_config() config['units'] = self.units return config def call(self, x, **kwargs): x = self.layer(x) return x class CustomModel(tf.keras.Model): def __init__(self, hidden_units): super(CustomModel, self).__init__() self.hidden_units = hidden_units self.dense_layers = [CustomLayer(u) for u in hidden_units] def call(self, inputs): x = inputs for layer in self.dense_layers: x = layer(x) return x def get_config(self): return {"hidden_units": self.hidden_units} @classmethod def from_config(cls, config): return cls(**config) model = CustomModel([16, 16, 10]) print(model(tf.random.uniform((1, 5))).shape) model.compile() model.save("my_model", ) print('---') loaded_model = tf.keras.models.load_model('my_model') ``` Or check out this [gist](https://colab.research.google.com/gist/mhorlacher/e172e2dc8ec10fc790cf1c3fa1931e8a/untitled8.ipynb). My TF version is `2.8.0`.

Sayak_Paul · May 10, 2022, 11:31am

That’s a warning and I am aware of it.

Sayak_Paul · May 10, 2022, 11:34am

Which one are you referring to?

Sayak_Paul · May 10, 2022, 11:35am

Yup, sorry. Just edited the post description.

Bhack · May 10, 2022, 11:55am

The same repo. There are some save and interpolate tests to check:

github.com

martinsbruveris/tensorflow-image-models/blob/main/tests/models/test_factory.py

import tempfile

import numpy as np
import pytest
import tensorflow as tf

from tfimm.models.factory import create_model, create_preprocessing, transfer_weights

from .architectures import TEST_ARCHITECTURES  # noqa: F401

# Models for which we cannot change the input size during model creation. Examples
# are some MLP models, where the number of patches becomes the number of filters
# for convolutional kernels.
FIXED_SIZE_MODELS_CREATION = [
    "mixer_test_model",  # mlp_mixer.py
    "resmlp_test_model",
    "gmlp_test_model",
]
# Models for which we cannot change the input size during inference.
FIXED_SIZE_MODELS_INFERENCE = [

This file has been truncated. show original

Bhack · May 10, 2022, 12:12pm

What is your problem exactly?

I don’t see too much things in the colab other then

WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.

As your model was not compiled why you don’t load with:
vit_dino_base_loaded = tf.keras.models.load_model("vit_dino_base", compile=False)

Sayak_Paul · May 10, 2022, 12:52pm

If you run the Colab with the decorator enabled, the model instantiation itself won’t run.

Without that, things work okay but here’s the problem. After the model is instantiated and is called on some inputs, the model sets its input shapes and is therefore unable to operate on inputs having different resolutions.

Sorry for not making it clear.

Bhack · May 10, 2022, 2:00pm

The input shape need to be defined for build/save so you cannot change the input as is after loading in your case.

You need to use a sort of transfer weights approach:

github.com

martinsbruveris/tensorflow-image-models/blob/main/tests/models/test_factory.py#L157-L179


      
          def test_change_input_size_inference(model_name):
              """
              We test if we can run inference with different input sizes.
              """
              model = create_model(model_name)
              # For transformer models we need specify the `interpolate_input` parameter. Models
              # that don't have the parameter will ignore it.
              flexible_model = create_model(model_name, interpolate_input=True)
              transfer_weights(model, flexible_model)
          
              # First we test if setting `interpolate_input=True` doesn't change the output for
              # original input size.
              rng = np.random.default_rng(2021)
              img = rng.random(
                  size=(1, *model.cfg.input_size, model.cfg.in_channels), dtype="float32"
              )
              res_1 = model(img)
              res_2 = flexible_model(img)
              assert (np.max(np.abs(res_1 - res_2))) / (np.max(np.abs(res_1)) + 1e-6) < 1e-6

This file has been truncated. show original

martinsbruveris/tensorflow-image-models/blob/main/docs/source/content/transformers.rst

Transformer Models
==================

Here we describe some features that are common to most transformer models.

Changing input shape
--------------------

Most parts of a transformer architecture are independent of input resolution. Changing
the input resolution results in a different number of patches. The projection,
self-attention and MLP layers work on arbitrary length inputs. The only part that needs
to be adapted are the position embeddings.

Position embeddings can adjusted via 2D interpolation to the new input resolution.
However, since position embeddings are learnt, after interpolation they may no longer
be meaningful. Thus, by default, transformer models can only run inference at the
resolution specified by ``input_size``.

If we want to fine-tune a model at a different resolution, we can specify the new
resolution when creating the model. In that case, the position embeddings will be

This file has been truncated. show original

Sayak_Paul · May 10, 2022, 2:15pm

Thanks for the pointers. Appreciate your help.

Cc: @ariG23498

Topic		Replies	Views
Issue with Deserializing a Custom Transformer Model in TensorFlow Keras tfconfig , transformers	1	284	May 20, 2024
Is it possible to use a custom input shape with efficient det? General Discussion models , tflite , help_request , model_maker	9	3903	July 25, 2022
Save and restore transformer model General Discussion keras , models , transformers	1	1102	March 18, 2023
Though Training accuracy is high performance on training data during inference in transformer translation is poor General Discussion models , transformers	0	602	June 9, 2023
How do I resolve "IndexError: tuple index out of range"? General Discussion help_request , transformers	1	2613	November 25, 2022

Recommended way to interpolate positional embeddings in ViT in TensorFlow

Related topics