StringLookup layer broken after upgrade of tensorflow

Hi,

Upgrading TensorFlow from Version: 2.14.0 to 2.16.1 breaks the instantiation of a StringLookup layer in my project:

id_from_token = tf.keras.layers.StringLookup(vocabulary=VocabularyTensor, mask_token=Universal_Mask_Token)

Results in the following error:

Exception has occurred: AttributeError

‘StringLookup’ object has no attribute ‘encoding’

Anybody else also seeing this error?

Would you mind sharing the full stack trace that comes with the error?

1 Like

The following code worked fine for me in this colab notebook:

!pip install -U tensorflow

import tensorflow as tf

print(tf.__version__)
print(tf.keras.__version__)

2.16.1
3.3.3

vocab = ["a", "b", "c", "d"]
data = [["a", "c", "d"], ["d", "z", "b"]]
layer = tf.keras.layers.StringLookup(vocabulary=vocab, mask_token="[MASK]")
layer(data)

<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[2, 4, 5],
[5, 1, 3]])>

1 Like

Thank you for replying!

Interesting, you use the numpy as an input to the StringLookup Layer.
But my code is first converting it into a tensor, like here:
vocab = tf.convert_to_tensor(vocab)
layer = tf.keras.layers.StringLookup(vocabulary=vocab, mask_token=“[MASK]”)
layer(data)

That used to work fine ( pretty sure I got it from an example somewhere ).
But now it is not anymore.
With the new version of tensorflow ( also in your colab ) I got this error:

372         vocabulary = vocabulary.numpy()
373         return np.array(

→ 374 [tf.compat.as_text(x, self.encoding) for x in vocabulary]
375 )
376

AttributeError: ‘StringLookup’ object has no attribute 'encoding’strong text

Regards
Jents

I believe I found the reason for the bug.

In the implementation of class StringLookup(IndexLookup), we find:

        super().__init__(
            max_tokens=max_tokens,
            num_oov_indices=num_oov_indices,
            mask_token=mask_token,
            oov_token=oov_token,
            vocabulary=vocabulary,
            idf_weights=idf_weights,
            invert=invert,
            output_mode=output_mode,
            pad_to_max_tokens=pad_to_max_tokens,
            sparse=sparse,
            name=name,
            vocabulary_dtype="string",
            **kwargs,
        )
        self.encoding = encoding
        self._convert_input_args = False
        self._allow_non_tensor_positional_args = True
        self.supports_jit = False

Note that it invokes the superclass (IndexLookup) constructor before setting the encoding. Then, in the implementation of IndexLookup.__init__, we find:

        if vocabulary is not None:
            self.set_vocabulary(vocabulary, idf_weights)

But set_vocabulary invokes _tensor_vocab_to_numpy:

        if tf.is_tensor(vocabulary):
            vocabulary = self._tensor_vocab_to_numpy(vocabulary)

Which tries to access self.encoding:

    # Overridden methods from IndexLookup.
    def _tensor_vocab_to_numpy(self, vocabulary):
        vocabulary = vocabulary.numpy()
        return np.array(
            [tf.compat.as_text(x, self.encoding) for x in vocabulary]
        )

Since self.encoding is not yet initialized, an error occurs.

It seems version 3.0.0 of Keras introduced this bug. In version 2.15.0, the StringLookup constructor initializes self.encoding before calling the superclass constructor:

        self.encoding = encoding

        super().__init__(
            max_tokens=max_tokens,
            num_oov_indices=num_oov_indices,
            mask_token=mask_token,
            oov_token=oov_token,
            vocabulary=vocabulary,
            vocabulary_dtype=tf.string,
            idf_weights=idf_weights,
            invert=invert,
            output_mode=output_mode,
            sparse=sparse,
            pad_to_max_tokens=pad_to_max_tokens,
            **kwargs
        )

I have reported this bug here.

1 Like

Interesting. Thanks!

The fix for this bug is in place and will presumably be available in the next release of Keras.