Preprocessing in TensorFlow

DaniPL · May 30, 2022, 1:35am

Good night,

I am working on a paper comparing Python libraries for machine learning and deep learning.

Trying to evaluate Keras and TensorFlow separately, I’m looking for information about TensorFlow methods or functions that can be used to preprocess datasets, such as those included in scikit-learn (sklearn.preprocessing) or the Keras preprocessing layers, but I can’t find anything beyond a one hot enconding for labels…

Does anyone know if what I am looking for exists?

Thank you very much!

pritamdodeja · May 30, 2022, 4:58am

Hi Dani,

Here are some links I have found to be helpful in this regard:

Tensorflow transforms
Getting started notebook
tft api

Tensorflow transform can actually even take in keras preprocessing layers, which certain caveats. It uses apache beam to scale the pipeline and does a lot to help with pipeline reproducibility.

As far as keras, here are some useful links
Good starting point

Specific examples:
Normalization
Discretization
Category Encoding
Hashed crossing

The scikit-learn comparison is especially interesting as the design choices of an all in-memory approach vs. a streaming approach become quite apparent. They do have a lot of commonalities such as the goal for using the same pipeline for training the data as is used at prediction time. The definition of pipeline itself is quite overloaded in the tensorflow ecosystem. For example, a tfx pipeline and a tft pipeline, how do they differ and what is their relationship with each other is an interesting point. For example, if I remember correctly, column_selector in scikit-learn can be directly integrated into the scikit learn pipeline, whereas in tfx, tensorflow dava validation handles inferring the schema, and tft uses the schema and enriches that, and other artifacts, for use downstream. As such, tfx feels much more de-coupled, but necessarily more complex and powerful with a steeper learning curve.

Hopefully this is enough to get you started, let me know if you need any further information.

Bhack · May 31, 2022, 8:14am

Other layers are available at:

https://keras.io/api/keras_cv/layers/preprocessing/

Topic		Replies	Views
Performing data wrangling on tf.data.Dataset General Discussion tfx , datasets , help_request	4	2238	January 19, 2022
Data Preprocessing handling General Discussion models , help_request	4	745	June 30, 2022
Tensorflow Decision Forests: How to encode a Tensor of categorical numeric data General Discussion keras , decision_forests , tfdf , help_request	3	1837	February 25, 2022
One Hot Encoding General Discussion help_request	5	1449	June 18, 2021
Keras Preprocessing - adapt multiple layers in one go General Discussion datasets , keras , help_request	4	1595	November 28, 2023

Preprocessing in TensorFlow

Related topics