concatenate then normalize OR normalize then concatenate

Bruno · December 3, 2024, 12:34pm

Hello,

I am doing some preprocessing in the model, with tabular data. I have many features, some categorical, some numerical. For the numerical ones, in Load CSV data | TensorFlow Core, they advice to concatenate then normalize. Why not the opposite: normalize then concatenate? What are the tradeoffs here?

How do I make sure the proper mean is applied to the proper feature? Only the features order?

Should I follow the same order (concatenate then normalize), if I use Discretization instead of Normalization.

Here is the code I use (with concatenate then normalize):

numeric_features = df[numerical_features_names]
numeric_features_dict = {key: value.to_numpy()[:, tf.newaxis] for key, value in dict(numeric_features).items()}

normalize_num=False
if normalize_num:
    layer1 = tf.keras.layers.Normalization(axis=-1)
    layer1.adapt(np.concatenate([value for key, value in sorted(numeric_features_dict.items())], axis=1))
else:
    layer1_discretization_params_dict = {
        'f1': [0, 10, 20, 30],
        'f2': [0, 70, 100]
        }
    layer1 = tf.keras.layers.Discretization(bin_boundaries=[discretization_values_dict[key] for key, value in sorted(numeric_features_dict.items())])


numeric_inputs = []
for name in numerical_features_names:
  numeric_inputs.append(inputs[name]) #inputs[name] = tf.keras.Input(shape=(1,), name=name, dtype=dtype)

numeric_inputs = tf.keras.layers.Concatenate(axis=-1)(numeric_inputs)
numeric_normalized = layer1(numeric_inputs)

Thank you.

Bruno

MOHD_RAMLAN_BIN_M_RO · December 3, 2024, 1:28pm

Sure! Here are a few options:

Great question! Normalizing before concatenation can sometimes lead to unintended scaling issues.
The order matters because the normalization layer calculates statistics based on the concatenated features.
Yes, the same principle applies to discretization – concatenate first to ensure consistent binning.

Bruno · December 4, 2024, 12:03am

OK thank you for the explanation. I’ll be sure to concatenate first now:)

Still wondering about the unintended scaling issues and in"consistent binning"…

Topic		Replies	Views
Confused about the normalization layer adapt General Discussion api , keras , help_request	1	1216	November 2, 2022
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor General Discussion models , keras , help_request	1	2200	October 27, 2022
Data loading to Tensorflow General Discussion	3	725	August 2, 2023
Concatenate function General Discussion tfjs , getting_started	2	476	January 30, 2023
Help with continuous + categorical features General Discussion models , keras , getting_started , help_request	1	1911	September 15, 2022

concatenate then normalize OR normalize then concatenate

Related topics