Hi everyone,
I am just now learning how to use Keras, so this is likely a very newbie question. I am going through the tutorials, I am doing the ‘Load and Preprocess data’ < ‘CSV’.
Below is the relevant code:
#Loading data and splitting features and labels
titanic = pd.read_csv("https://storage.googleapis.com/tf-datasets/titanic/train.csv")
titanic_features = titanic.copy()
titanic_labels = titanic_features.pop('survived')
#Separating the numeric columns as we need to normalize them, and making it into a dict of symbolic tensors
inputs = {}
for name, column in titanic_features.items():
dtype = column.dtype
if dtype == object:
dtype = tf.string
else:
dtype = tf.float32
inputs[name] = tf.keras.Input(shape=(1,), name=name, dtype=dtype)
numeric_inputs = {name:input for name,input in inputs.items()
if input.dtype==tf.float32}
#Concatenate the numeric columns
x = layers.Concatenate()(list(numeric_inputs.values()))
#Normalization layer
norm = layers.Normalization()
#Adapt the normalization layer onto the data
norm.adapt(np.array(titanic[numeric_inputs.keys()]))
all_numeric_inputs = norm(x)
So the question is, why are we directing the normalization layer onto the titanic[numeric_inputs.keys()]), which is the raw data with the numeric columns selected. Why wouldn’t we direct it into ‘x’ which is the concatenated numeric values? Also why are we adapting to a data that is turned into an numPy array format? Why not just adapt onto the symbolic tensors since their whole point is supposed to be to keep track of the operations that are done on them. I am very confused about this, and would appreciate if someone could explain or refer me to something that can help me understand how these layers and symbolic tensors are supposed to be set up regarding what should call what in what hierarchy so to say.
Thanks for the help,
Dominik