I made a multiclass text classification; I save the tokenizer and the model in this way:
# Save the Tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
tokenizer_json = tokenizer.to_json()
with open('tokenizer.json', 'w', encoding='utf-8') as f:
f.write(tokenizer_json)
# Save the model
model.save('my_model')
But when I reload the model and evaluate a new text, the result is not good.
# Load the Tokenizer
with open('tokenizer.json', 'r', encoding='utf-8') as f:
tokenizer_json = f.read()
tokenizer = tf.keras.preprocessing.text.tokenizer_from_json(tokenizer_json)
# Load the model
model = load_model(modelpath, custom_objects=None)
Hi @Francesca_Pisani . You’re nearly there, I beleive. I’m not the greatest expert around here in text classification, but here is how I would go after you loaded both your model saved ond your tokenizer (I’m showing with what is modtly pseudo code):
Preprocess the new dataset using the loaded tokenizer.
new_data = [‘Text 1’, ‘Text 2’, …] # Your new dataset
sequences = tokenizer.texts_to_sequences(new_data)