def build_classifier_model():
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessing_layer = hub.KerasLayer(tfhub_handle_preprocess, name='preprocessing')
encoder_inputs = preprocessing_layer(text_input)
encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True, name='BERT_encoder')
outputs = encoder(encoder_inputs)
net = outputs['pooled_output']
net = tf.keras.layers.Dropout(0.1)(net)
net = tf.keras.layers.Dense(1, activation=None, name='classifier')(net)
return tf.keras.Model(text_input, net)
This is taken from Classify text with BERT | Text | TensorFlow
What I think is because the string data is of varying length of tokens, and it will automatically made uniform by preprocessing layer + converted to numeric vector form, we just need to define an entry point (input layer) for the model then passthrough the data (data passed from input layer to first hidden layer is always unchanged).