Bad results when I reload a saved model

I made a multiclass text classification; I save the tokenizer and the model in this way:

# Save the Tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
tokenizer_json = tokenizer.to_json()
with open('tokenizer.json', 'w', encoding='utf-8') as f:
    f.write(tokenizer_json)

# Save the model
model.save('my_model')

But when I reload the model and evaluate a new text, the result is not good.

# Load the Tokenizer
with open('tokenizer.json', 'r', encoding='utf-8') as f:
    tokenizer_json = f.read()
    tokenizer = tf.keras.preprocessing.text.tokenizer_from_json(tokenizer_json)

# Load the model
model = load_model(modelpath, custom_objects=None)

Can you help me?

Hi @Francesca_Pisani . You’re nearly there, I beleive. I’m not the greatest expert around here in text classification, but here is how I would go after you loaded both your model saved ond your tokenizer (I’m showing with what is modtly pseudo code):

  1. Preprocess the new dataset using the loaded tokenizer.

new_data = [‘Text 1’, ‘Text 2’, …] # Your new dataset
sequences = tokenizer.texts_to_sequences(new_data)

  1. Pad the sequences

max_sequence_length = … # Specify your desired sequence length
padded_sequences = pad_sequences(sequences, maxlen=max_sequence_length)

  1. Use the loaded model to predict on the new dataset you just prepared:

predictions = model.predict(padded_sequences)

Thank you very much :+1:

1 Like