Hi, I am implementing a multi-output version of the TFX Taxi pipeline. In this case, the output is a probability distribution of the payment type: [‘Cash’, ‘Credit Card’, ‘Dispute’, ‘No Charge’, ‘Pcard’, ‘Unknown’, ‘Prcard’]
Here is an example of the input, which are the fields trip_miles, pickup_latitude and trip_start_hour:
Here is an example of the output, probability distribution across : ‘Cash’, ‘Credit Card’, ‘Dispute’, ‘No Charge’, ‘Pcard’, ‘Unknown’, ‘Prcard’
Here is a summary of the ML model’s structure:
(Note: a softmax activation function is used for the output layer to maintain a probability distribution)
This is the error message I am getting:
Pipeline is run on a vm, through the tfx command
tfx run create --pipeline_name=local_runner.py --engine=local
Is it possible to do a multi-field-output model like this in TFX?
Are you using the Keras Functional API? If not you might want to look at it.
Hi, thanks for the heads-up! Just took a look at the Keras Functional API and I think I understand how the multi-output model is supposed to be called. Interesting thing to note, I realized that with the softmax output I am trying to implement, there is really only one ‘output’ but with multiple fields.
Here is how I am defining the model in the models.py file - the only change I made was the output layer having an activation=softmax argument across however many fields (categories) are in the output columns:
def _wide_and_deep_classifier(wide_columns, deep_columns, output_columns, dnn_hidden_units,
learning_rate):
"""Build a simple keras wide and deep model.
Args:
wide_columns: Feature columns wrapped in indicator_column for wide (linear)
part of the model.
deep_columns: Feature columns for deep part of the model.
dnn_hidden_units: [int], the layer sizes of the hidden DNN.
learning_rate: [float], learning rate of the Adam optimizer.
Returns:
A Wide and Deep Keras model
"""
# Keras needs the feature definitions at compile time.
# TODO(b/139081439): Automate generation of input layers from FeatureColumn.
input_layers = {
colname: tf.keras.layers.Input(name=colname, shape=(), dtype=tf.float32)
for colname in features.transformed_names(
features.DENSE_FLOAT_FEATURE_KEYS)
}
input_layers.update({
colname: tf.keras.layers.Input(name=colname, shape=(), dtype='int32')
for colname in features.transformed_names(features.VOCAB_FEATURE_KEYS)
})
input_layers.update({
colname: tf.keras.layers.Input(name=colname, shape=(), dtype='int32')
for colname in features.transformed_names(features.BUCKET_FEATURE_KEYS)
})
input_layers.update({
colname: tf.keras.layers.Input(name=colname, shape=(), dtype='int32') for
colname in features.transformed_names(features.CATEGORICAL_FEATURE_KEYS)
})
# TODO(b/161952382): Replace with Keras premade models and
# Keras preprocessing layers.
deep = tf.keras.layers.DenseFeatures(deep_columns)(input_layers)
for numnodes in dnn_hidden_units:
deep = tf.keras.layers.Dense(numnodes)(deep)
wide = tf.keras.layers.DenseFeatures(wide_columns)(input_layers)
output = tf.keras.layers.Dense(
len(output_columns), activation='softmax')(
tf.keras.layers.concatenate([deep, wide]))
model = tf.keras.Model(input_layers, output)
model.compile(
loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
metrics=[tf.keras.metrics.BinaryAccuracy()])
model.summary(print_fn=logging.info)
return model
I can’t help but feel I am doing something incorrectly with the TFX components surrounding the Trainer component. Meaning, the model is defined correctly but I am not stating the outputs correctly in pipeline.py (or perhaps somewhere else)
I have attached the github repository where I keep all the code+directories for the pipeline here.
It looks to me like the error is coming from Keras during training. I’d suggest creating a version of the training code outside of TFX and just focusing on getting that part working. Then you can put it into a Trainer component and get that working, and then examine the artifacts that Trainer is outputting.