I need help with the following scenario. Imagine that my original data have this format:
Group | Feature 1 | Feature 2 | Feature 3 | Feature 4 | Target |
---|---|---|---|---|---|
Group A | Value F1.A | Value F2.1 | Value F3.A | Value F4.1 | 1 |
Group B | Value F1.B | Value F2.2 | Value F3.B | Value F4.2 | 0 |
Group A | Value F1.A | Value F2.3 | Value F3.A | Value F4.3 | 1 |
Group C | Value F1.C | Value F2.4 | Value F3.C | Value F4.4 | 0 |
Group C | Value F1.C | Value F2.5 | Value F3.C | Value F4.5 | 0 |
Group A | Value F1.A | Value F2.6 | Value F3.A | Value F4.6 | 1 |
In this example, the values in the Feature 1, Feature 3 and Target are the same for each group. I want to split this data so I have two inputs
InputA, with only the features with the same values and grouped by group:
Group | Feature 1 | Feature 3 | Target |
---|---|---|---|
Group A | Value F1.A | Value F3.A | 1 |
Group B | Value F1.B | Value F3.B | 0 |
Group C | Value F1.C | Value F3.C | 0 |
And InputB, with the features where the values differ inside the group:
Feature 2 | Feature 4 |
---|---|
Value F2.1 | Value F4.1 |
Value F2.2 | Value F4.2 |
Value F2.3 | Value F4.3 |
Value F2.4 | Value F4.4 |
Value F2.5 | Value F4.5 |
Value F2.6 | Value F4.6 |
So I ended up with InputA with 4 features and 3 rows (shape (4, 3)) and InputB with 2 features and 6 rows (shape (6, 2)).
My question is : Can I build a multi input model using tensorflow and/or keras using inputs with different numbers of columns and rows?
After splittting the target from the InputA data and splitting both inputs into train and test, I ended up with six datasets: inputA_train, inputA_test, inputB_train, inputB_test and the target (y_train and y_test).
I created the model for the InputA:
a_input = tf.keras.Input(shape=(inputA_train.shape[1]), name='a_input')
x = tf.keras.layers.Flatten()(a_input)
x = tf.keras.layers.Dense(32)(x)
And I also created the model for the InputB:
b_input = tf.keras.Input(shape=(inputB.shape[1]), name='b_input')
y = tf.keras.layers.Flatten()(b_input)
y = tf.keras.layers.Dense(64)(y)
After that I concatenate both models, build and compile the final model:
z = tf.keras.layers.Concatenate()([x, y])
z = tf.keras.layers.Dense(1, activation="sigmoid")(z)
model = tf.keras.models.Model(inputs=[a_input, b_input], outputs=z)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Finally, I tried to fit this model:
model.fit(
x=[inputA_train, inputB_train],
y=y_train,
validation_data=([inputA_test, inputB_test], y_test),
epochs=10,
)
If I try to fit this model using pandas dataframes I get the following error:
ValueError: Data cardinality is ambiguous:
Make sure all arrays contain the same number of samples.
Another question that I have here regarding multi-input models: Can I have only one dataset containing the target feature with a number of samples equal to only one of the inputs used by the model or each input need to have its own target dataset?
I also tried to convert the pandas dataframe into a tensorflow dataset:
inputA_train_ds = tf.data.Dataset.from_tensor_slices((inputA_train))
inputA_test_ds = tf.data.Dataset.from_tensor_slices((inputA_test))
inputB_train_ds = tf.data.Dataset.from_tensor_slices((inputB_train))
inputB_test_ds = tf.data.Dataset.from_tensor_slices((inputB_test))
I tried to fit the model again using the datasets:
model.fit(
x=[inputA_train, inputB_train],
y=y_train,
validation_data=([inputA_test, inputB_test], y_test),
epochs=10,
)
Now I get the following error:
ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values
of types {"<class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>"}), <class 'pandas.core.series.Series'>
I want to know if it’s possible to build the model that I described here and, if it’s indeed possible, I want to know what I need to change in my code to fix these errors that I’m encountering.
Thanks in advance