What I currently have and trying to do:
When I receive a request from a client to the model in tensorflow-serving, I first need to process them using 13 regexes, then pass that text through tf.keras.preprocessing.text.Tokenizer
to convert them to numbers(or token), and then pass them to tf.keras.preprocessing.sequence.pad_sequences
to add 0s (for the sentences whose lenght doesn’t match the input that the model expects) at the end of each array(for a batch of inputs), then this(a single sentence or a batch of sentences as tokens) will be fed to a tf.keras model to get some probabilities as outputs. And I then need to map these probabilites(different thresholds for different units) to texts and return it to the client.
What problems am I currently facing trying to accomplish above:
While trying to put together all that to be able to serve the model using tensorflow-serving, I learned that some parts can be converted to tensorflow functions, but not all of it.
- regexes: I still couldn’t figure out where and how to put my regexes to be able to manipulate the text.
- tokenizer: I learned from some blogs and SO questions, that tf.lookup.StaticHashTable can be used for this purpose.
- pad_sequences: no help with this too.
- post-processing: I could find very little information to do this.
I read the beginner and advanced blogs on tensorflow-transform tutorials page, but either of them mentioned how to link those tft functions to the tf-keras model, while saving it. And I could also find some information about adding pre-processing for serving, but all of them involved tensorflow code and some workarounds, but they didn’t cover what I am trying to achieve even indirectly.
I can provide more information as required.
How do I add these steps to the graph, while saving the model?
thanks for the answer and suggestions.
1 & 3. I will try to adopt tf.strings.regex_replace
for regex operations on my text and text.pad_model_inputs
. but, how do I put this inside the graph, while doing tf.keras.models.save_model()
or convey tensorflow that i have some regexes in variables that have to be included in the graph?
4. Yes, I have been doing Sequence tagging, multi-label classification and mutli-class classification and this question is aimed at learning to serve those models with tf-serving. so, for example, with multi-label, I want to use the logits from tf.keras.model
and if threshold is >0.5, i want to label the input text as belonging to a label(texts from a dictionary); and I also have different thresholds for different label. like previous comment, where and how do I include logic/code for this while saving the model?
2. I didn’t know about SentencePiece and WordPiece tokenizers. you meant to say that these packages/libraries have been useful for you? Sure, i will adapt them.
1 & 3 & 4. After training the model, you can save graph with pre-processing and post-processing steps like below
...
...
# some training steps
model = ...
model.compile(...)
model.fit(...)
@tf.function
def inference_function(text):
# do some preprocessing
text = tf.strings.regex_replace(text, # ... some regex patterns...)
token_ids, starts, ends = tokenizer.tokenize_with_offsets(text)
model_inputs = # prepare model inputs using token_ids
# inference model
model_outputs = model(model_inputs)
outputs = # do some post-processing with starts, ends, and model_outputs
return outputs
# https://www.tensorflow.org/api_docs/python/tf/keras/Model#save
model.save(
"some path to save the model",
signatures={
"inference_fn": inference_function.get_concrete_function(tf.TensorSpec([None], dtype=tf.string)),
}
)
- Yes! After training the sentencepiece model, you can load and use it with
text.SentencepieceTokenizer
in TF graph.
2 Likes