Keras Preprocessing - adapt multiple layers in one go

Geraud · December 13, 2021, 2:11pm

I’m a huge fan of tf.data mainly on how it speeds up the preprocessing of large datasets that don’t fit in memory. I have been using the Keras Preprocessing layers for a while now and I’m still struggling to overcome one main issue that is to adapt multiple layers at once.

In the example given to Introduce Preprocessing layers in Keras the author shows this snippet:

text_vectorizer = tf.keras.layers.TextVectorization(
     output_mode='multi_hot', max_tokens=2500)
features = train_ds.map(lambda x, y: x)
text_vectorizer.adapt(features)

normalizer = tf.keras.layers.Normalization(axis=None)
normalizer.adapt(features.map(lambda x: tf.strings.length(x)))

def preprocess(x):
  multi_hot_terms = text_vectorizer(x)
  normalized_length = normalizer(tf.strings.length(x))
  # Combine the multi-hot encoding with review length.
  return tf.keras.layers.concatenate((multi_hot_terms, normalized_length))

def forward_pass(x):
  return tf.keras.layers.Dense(1)(x)  # Linear model.

inputs = tf.keras.Input(shape=(1,), dtype='string')
outputs = forward_pass(preprocess(inputs))
model = tf.keras.Model(inputs, outputs)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True))
model.fit(train_ds, epochs=5)

Now because we’re calling adapt twice this code will iterate over the dataset two times. My question is if there is a way to change this code so that both layers will be adapted in one pass over the data? Some sort of Model class but for preprocessing like this:

class Preprocessor(tf.made_up_class.Preprocess):
  def __init__(**kwargs):
    self.text_vectorizer = tf.keras.layers.TextVectorization(
     output_mode='multi_hot', max_tokens=2500)
    self.normalizer() = tf.keras.layers.Normalization(axis=None)

  def adapt(self, x):
    vectorized_text = self.text_vectorizer(x)
    out = self.normalizer(vectorized_text)
    return out


preprocessor = Preprocessor()
preprocessor.adapt(features)

Maybe this specific example is tricky but many time one ends up fitting many StringLookup layers for different columns when using structured data, which can take hours if your data is big.

I saw this post about a new package but I’m not sure it preprocess the features in one pass.

Renu_Patel · November 25, 2023, 10:08am

Hi @Geraud

Welcome to the TensorFlow Forum!

You can create a callable function for input standardization for the same task. After that pass this callable function inside the TextVectorization() keras layer in standardize argument to create a vectorize layer to be adapted by the input dataset for model training.

Please have a look at this example of Dataset preprocessing using TextVectorization for reference. Thank you.

Geraud · November 27, 2023, 8:15am

Thank you @Renu_Patel , this is nice if you want to extend the TextVectorization layer but it doesn’t solve the problem of having the call adapt for each Preprocessing layer individually. I think what I was looking for was a way to build a Preprocessing class as you would do when subclassing keras.Model, but instead of defining your froward step into call you would create a new adapt method and all preprocessors will be fitted over one iteration of the dataset. I hope this makes sense?

Sergii_Makarevych · November 28, 2023, 6:15am

Have the same question as @Geraud. In my case I have few TB dataset and hundreds of stateful preprocessing layers.

I see we can do that by calling update_state on a batch, but maybe TF would suggest something better:

sl = tf.keras.layers.StringLookup()

batches = [["a", "b"], ["c", "d"]]
for batch in batches:
  sl.update_state(batch)

sl.finalize_state()
sl._is_adapted = True

sl.get_vocabulary()

Geraud · November 28, 2023, 8:27am

Thanks @Sergii_Makarevych and yes this is what I ended up doing, I also had to call _adapt_maybe_build before the first update_state for some reason.
Hopefully, there is (will be) a cleaner/easier way to do this kind of things as preprocessing the data is a big part of each flow.

Topic		Replies	Views
Preprocessing Layers and KerasTuner Cooperation General Discussion models , keras	3	2299	April 6, 2023
How to build a preprocessing layer with different preprocessing for each feature? General Discussion help_request	2	396	January 12, 2025
Problem with a custom pre-processing layer for serving a NER model General Discussion models , help_request	1	292	January 2, 2023
Preprocessing in TensorFlow General Discussion keras , help_request , tensorflow	2	580	May 31, 2022
Why I'm not able to pass a string to my custom layer? General Discussion	1	564	January 27, 2024

Keras Preprocessing - adapt multiple layers in one go

Related topics