StringLookup layer broken after upgrade of tensorflow

Jents · May 28, 2024, 5:52am

Hi,

Upgrading TensorFlow from Version: 2.14.0 to 2.16.1 breaks the instantiation of a StringLookup layer in my project:

id_from_token = tf.keras.layers.StringLookup(vocabulary=VocabularyTensor, mask_token=Universal_Mask_Token)

Results in the following error:

Exception has occurred: AttributeError

‘StringLookup’ object has no attribute ‘encoding’

Anybody else also seeing this error?

rcauvin · May 28, 2024, 12:45pm

Would you mind sharing the full stack trace that comes with the error?

rcauvin · May 28, 2024, 1:22pm

The following code worked fine for me in this colab notebook:

!pip install -U tensorflow

import tensorflow as tf

print(tf.__version__)
print(tf.keras.__version__)

2.16.1
3.3.3

vocab = ["a", "b", "c", "d"]
data = [["a", "c", "d"], ["d", "z", "b"]]
layer = tf.keras.layers.StringLookup(vocabulary=vocab, mask_token="[MASK]")
layer(data)

<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[2, 4, 5],
[5, 1, 3]])>

Jents · May 28, 2024, 3:37pm

Thank you for replying!

Interesting, you use the numpy as an input to the StringLookup Layer.
But my code is first converting it into a tensor, like here:
vocab = tf.convert_to_tensor(vocab)
layer = tf.keras.layers.StringLookup(vocabulary=vocab, mask_token=“[MASK]”)
layer(data)

That used to work fine ( pretty sure I got it from an example somewhere ).
But now it is not anymore.
With the new version of tensorflow ( also in your colab ) I got this error:

372         vocabulary = vocabulary.numpy()
373         return np.array(

→ 374 [tf.compat.as_text(x, self.encoding) for x in vocabulary]
375 )
376

AttributeError: ‘StringLookup’ object has no attribute 'encoding’strong text

Regards
Jents

rcauvin · May 28, 2024, 9:17pm

I believe I found the reason for the bug.

In the implementation of class StringLookup(IndexLookup), we find:

        super().__init__(
            max_tokens=max_tokens,
            num_oov_indices=num_oov_indices,
            mask_token=mask_token,
            oov_token=oov_token,
            vocabulary=vocabulary,
            idf_weights=idf_weights,
            invert=invert,
            output_mode=output_mode,
            pad_to_max_tokens=pad_to_max_tokens,
            sparse=sparse,
            name=name,
            vocabulary_dtype="string",
            **kwargs,
        )
        self.encoding = encoding
        self._convert_input_args = False
        self._allow_non_tensor_positional_args = True
        self.supports_jit = False

Note that it invokes the superclass (IndexLookup) constructor before setting the encoding. Then, in the implementation of IndexLookup.__init__, we find:

        if vocabulary is not None:
            self.set_vocabulary(vocabulary, idf_weights)

But set_vocabulary invokes _tensor_vocab_to_numpy:

        if tf.is_tensor(vocabulary):
            vocabulary = self._tensor_vocab_to_numpy(vocabulary)

Which tries to access self.encoding:

    # Overridden methods from IndexLookup.
    def _tensor_vocab_to_numpy(self, vocabulary):
        vocabulary = vocabulary.numpy()
        return np.array(
            [tf.compat.as_text(x, self.encoding) for x in vocabulary]
        )

Since self.encoding is not yet initialized, an error occurs.

It seems version 3.0.0 of Keras introduced this bug. In version 2.15.0, the StringLookup constructor initializes self.encoding before calling the superclass constructor:

        self.encoding = encoding

        super().__init__(
            max_tokens=max_tokens,
            num_oov_indices=num_oov_indices,
            mask_token=mask_token,
            oov_token=oov_token,
            vocabulary=vocabulary,
            vocabulary_dtype=tf.string,
            idf_weights=idf_weights,
            invert=invert,
            output_mode=output_mode,
            sparse=sparse,
            pad_to_max_tokens=pad_to_max_tokens,
            **kwargs
        )

I have reported this bug here.

Jents · May 31, 2024, 7:24pm

Interesting. Thanks!

rcauvin · May 31, 2024, 7:51pm

The fix for this bug is in place and will presumably be available in the next release of Keras.

Topic		Replies	Views
Byte array in StringLokkup fails to save the model Keras api , keras , help_request	1	903	October 14, 2022
'StringLookup' object has no attribute 'vocab_size' TensorFlow recommenders , tfkeras , tensorflow	6	293	September 10, 2024
Using tensorflow.keras throws import error: ImportError: cannot import name 'type_spec_registry' from 'tensorflow.python.framework' General Discussion keras , gpu	13	24972	July 1, 2024
ValueError When Initializing FactorizedTopK in TensorFlow Recommenders Model General Discussion models	6	703	April 26, 2024
Cloning of TextVectorization Layer with Split Function Not Working General Discussion models , pipelines , text-vectorization , tfkeras	5	455	January 30, 2024

StringLookup layer broken after upgrade of tensorflow

Related topics