I’m trying to implement the masking generation function for BEiT:
"""
Originally inspired by impl at https://github.com/zhunzhong07/Random-Erasing, Apache 2.0
Copyright Zhun Zhong & Liang Zheng
Hacked together by / Copyright 2020 Ross Wightman
Modified by Hangbo Bao, for generating the masked position for visual image transformer
"""
# --------------------------------------------------------
# BEIT: BERT Pre-Training of Image Transformers (https://arxiv.org/abs/2106.08254)
# Github source: https://github.com/microsoft/unilm/tree/master/beit
# Copyright (c) 2021 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# By Hangbo Bao
# Based on timm, DINO and DeiT code bases
# https://github.com/rwightman/pytorch-image-models/tree/master/timm
# Originally inspired by impl at https://github.com/zhunzhong07/Random-Erasing, Apache 2.0
# Copyright Zhun Zhong & Liang Zheng
#
# Hacked together by / Copyright 2020 Ross Wightman
This file has been truncated. show original
The part I am struggling with is the assignment of EagerTensors.
I have consulted references that show how to approach such assignments, but this one does not seem to fit them.
Any particular approaches I should try out or look into for this case?
Bhack
March 14, 2022, 11:26am
3
Is every single masking patch random inside the single image there?
A single mask can be applied to a batch too.
Thanks for sharing this. Will take a look.
@Bhack there’s actually no masking involved in the link you sent.
So, the question is pretty much still open.
Bhack
March 15, 2022, 10:20am
9
Yes as they are just reloading Microsoft weights. So no train protocol there.
What Is your specific issue? Isn’t just the standard
image tokenization in many visual transformer where some token are masked?
What Is your specific issue? Isn’t just the standard
image tokenization in many visual transformer where some token are masked?
My issue is in the block-wise masking strategy where apparently tensor assignment is needed (refer to my initial post). Had it been randomized, it would have been easier and we implemented that a while back (here ).
Bhack
March 15, 2022, 12:21pm
11
To exactly mimic that impl are you looking for slice assigment?
Yes. Please take note of this part before sharing existing references:
I have consulted references that show how to approach such assignments, but this one does not seem to fit them.
If there’s no way other than doing something like this , then it’s a different choice.
Bhack
March 15, 2022, 1:05pm
13
Oh, in that case historically we are full of slice assignment tickets. Just to mention a few still open:
opened 09:28AM - 19 Jun 20 UTC
stat:awaiting tensorflower
type:feature
comp:ops
**System information**
- TensorFlow version (you are using): 2.2
- Are you wil… ling to contribute it (Yes/No): Yes
**Describe the feature and the current behavior/state.**
I would like to have slice assignment for Tensor objects in TensorFlow.
The code I would like to write is:
```python
import tensorflow as tf
a = tf.constant([1, 2, 4, 5, 7, 3, 2, 6,])
indices = tf.constant([3, 4], dtype=tf.int32)
a[indices] += 1
```
Of course it's a simplistic example and doesn't cover everything I want to do (I would use it in more complex functions not with constants), and I am happy to make it more complex if necessary.
Currently this code gives the error:
```
TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>
```
**Will this change the current api? How?**
I guess this is a change of API since it introduces a new functionality.
**Who will benefit with this feature?**
A lot of people have been asking for this feature for example in this GitHub issues:
- https://github.com/tensorflow/tensorflow/issues/14132#issuecomment-483002522
- https://github.com/tensorflow/tensorflow/issues/33131
These issues have unfortunately been closed because some workarounds for specific use-cases have been found (ones where the slicing is fixed and you can use [masking](https://github.com/tensorflow/tensorflow/issues/14132#issuecomment-483002522) or [TensorArrays](https://github.com/tensorflow/tensorflow/issues/14132#issuecomment-487643287)).
Some other issues deal with `Variable`s which is not what I am talking about here. [Some workarounds do exist](https://stackoverflow.com/a/62202181/4332585) involving `Variable` but they seem hacky.
I will personally benefit from it, in the multiple places where I now use `tensor_scatter_nd_add` or `tensor_scatter_nd_update`, which is solution that always works but is very difficult to write and very slow:
- [for a wavelet-based neural network, called MWCNN](https://github.com/zaccharieramzi/tf-mwcnn/blob/master/mwcnn.py#L106-L110);
- [for non-uniform fast fourier transform](https://github.com/zaccharieramzi/tfkbnufft/blob/master/tfkbnufft/nufft/interp_functions.py#L151);
- [for sensitivity map extraction when doing MRI reconstruction with TensorFlow neural networks](https://github.com/zaccharieramzi/fastmri-reproducible-benchmark/blob/master/fastmri_recon/data/utils/multicoil/smap_extract.py#L27-L35).
**Any Other info.**
The `tensor_scatter_nd_*` alternative might seem like a viable solution, but it suffers from 2 drawbacks that I consider huge:
- It is very difficult to write. It is actually so difficult, I decided to make a package that would alleviate this difficulty by having the different slicing possibilities unit tested: [tf-slice-assign](https://github.com/zaccharieramzi/tf-slice-assign).
- It is very slow. I made a [benchmark notebook](https://colab.research.google.com/drive/1gEjha7h1mhQkFwULS9MAU0bWQfzfEALY?usp=sharing) vs `pytorch` for slice assignment add. You can see that on GPU, using `tensor_scatter_nd_add` is 10 times slower than slice assignment in `pytorch` and 20 times slower on CPU. For a practical example, it means that my `tfkbnufft` (for non-uniform fast fourier transform) package is 30 times slower than its [torch counterpart](https://github.com/mmuckley/torchkbnufft#computation-speed) which I translated. This currently removes the possibility of training neural networks using the non-uniform fourier transform in TensorFlow.
opened 03:54AM - 08 Oct 19 UTC
stat:awaiting tensorflower
type:feature
comp:ops
TF 2.11
as in numpy or pytorch ,we can do someting like this, but how to do it with tf2.… 0.
the following code will raise exception as :
`'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment`
prediction[:,:,0]=tf.math.sigmoid(prediction[:,:,0])
opened 04:19PM - 18 Mar 19 UTC
closed 11:59AM - 24 Mar 22 UTC
stat:contribution welcome
stat:awaiting response
type:feature
stale
comp:ops
I was wondering if it is possible to implement Numpy like slicing annd updating … a[1:10,2:20....] in Tensorflow. It would make life much easier. Right now it code just gets bigger and uglier and bug prone.
I’ve not checked the paper in details on what kind of index is going to be selected to execute the masking. Cannot be covered by tf.tensor_scatter_nd_update
after populating these indexes?
The indexing conditions are in the source code I provided.
If you know a way around with scatter, do you mind providing a minimal working code.
Bhack
March 15, 2022, 1:27pm
15
E.g. I think that embedding in the Hugginface transformers library, also if it is using Pytorch ops, is not going to require/use the slice assignment:
self.mask_token = None
self.patch_embeddings = BeitPatchEmbeddings(config)
self.patch_size = config.patch_size
self.image_size = (
config.image_size
if isinstance(config.image_size, collections.abc.Iterable)
else (config.image_size, config.image_size)
)
num_patches = self.patch_embeddings.num_patches
if config.use_absolute_position_embeddings:
self.position_embeddings = nn.Parameter(torch.zeros(1, num_patches + 1, config.hidden_size))
else:
self.position_embeddings = None
self.dropout = nn.Dropout(config.hidden_dropout_prob)
def interpolate_pos_encoding(self, embeddings: torch.Tensor, height: int, width: int) -> torch.Tensor:
"""
This method allows the model to interpolate the pre-trained position encodings so that it can be used on
higher resolution images.
This file has been truncated. show original
I think you’re mistaken then.
bool_masked_pos
in the forward()
is nothing but the output the mask yielded by the class I showed in my initial post.
Bhack
March 15, 2022, 9:09pm
17
Bhack:
tensor_scatter_nd_update
It is true bool_masked_pos
is only the “application” of the masking but then ownership to prepare the mask it is still to the external the caller.
I don’t see all the details that are in reference implementation in the paper but with the concrete reference implementation you shared, with all these attemps
, conditional loops
etc, you could try to use a tf.variable
to mimic that implementation but probably you will need to refactor it more in graph mode/tf.function
:
Absolutely. And in case no reference implementations are available I guess the implementation done by the actual author comes to the rescue.
There isn’t much about it in the paper apart from the figure on block-wise masking which is why the original implementation is an important reference point.
Thanks for sharing your implementation. Will check it out.
Bhack
March 17, 2022, 1:58pm
19
Having a tf.function/graph
version It is quite trivial with few changes/substitution with TF ops.
But a jit_compile=True
version it will require a new design and probably some compromises.
Let me know if you have a jit_comile=True
version.
What’s trivial for you may not be trivial to someone else
Bhack
March 17, 2022, 2:25pm
21
Let me know when you have the same I’ve posted but with TF instead of numpy ops.
I will help you to make the required changes for tf.function
.
Bhack
March 17, 2022, 10:23pm
22
This is already working with tf.function
with minimal changes
@tf.function
def _mask(self, mask, max_mask_patches):
delta = 0
for attempt in tf.range(10):
target_area = random.uniform(self.min_num_patches, max_mask_patches)
aspect_ratio = math.exp(random.uniform(*self.log_aspect_ratio))
h = int(round(math.sqrt(target_area * aspect_ratio)))
w = int(round(math.sqrt(target_area / aspect_ratio)))
if w < self.width and h < self.height:
top = random.randint(0, int(self.height - h))
left = random.randint(0, int(self.width - w))
num_masked = tf.math.count_nonzero(mask[top: top + h, left: left + w])
# Overlap
if 0 < h * w - num_masked and h * w - num_masked <= max_mask_patches:
for i in range(top, top + h):
for j in range(left, left + w):
if mask[i, j] == 0:
mask[i, j].assign(1)
delta += 1
if delta > 0:
break
return delta