concatenate then normalize OR normalize then concatenate

Hello,

I am doing some preprocessing in the model, with tabular data. I have many features, some categorical, some numerical. For the numerical ones, in Load CSV data  |  TensorFlow Core, they advice to concatenate then normalize. Why not the opposite: normalize then concatenate? What are the tradeoffs here?

How do I make sure the proper mean is applied to the proper feature? Only the features order?

Should I follow the same order (concatenate then normalize), if I use Discretization instead of Normalization.

Here is the code I use (with concatenate then normalize):

numeric_features = df[numerical_features_names]
numeric_features_dict = {key: value.to_numpy()[:, tf.newaxis] for key, value in dict(numeric_features).items()}

normalize_num=False
if normalize_num:
    layer1 = tf.keras.layers.Normalization(axis=-1)
    layer1.adapt(np.concatenate([value for key, value in sorted(numeric_features_dict.items())], axis=1))
else:
    layer1_discretization_params_dict = {
        'f1': [0, 10, 20, 30],
        'f2': [0, 70, 100]
        }
    layer1 = tf.keras.layers.Discretization(bin_boundaries=[discretization_values_dict[key] for key, value in sorted(numeric_features_dict.items())])


numeric_inputs = []
for name in numerical_features_names:
  numeric_inputs.append(inputs[name]) #inputs[name] = tf.keras.Input(shape=(1,), name=name, dtype=dtype)

numeric_inputs = tf.keras.layers.Concatenate(axis=-1)(numeric_inputs)
numeric_normalized = layer1(numeric_inputs)

Thank you.

Bruno

Sure! Here are a few options:

  • Great question! Normalizing before concatenation can sometimes lead to unintended scaling issues.
  • The order matters because the normalization layer calculates statistics based on the concatenated features.
  • Yes, the same principle applies to discretization – concatenate first to ensure consistent binning.
1 Like

OK thank you for the explanation. I’ll be sure to concatenate first now:)

Still wondering about the unintended scaling issues and in"consistent binning"…