How to load model for deployment properly?

MSI · December 5, 2021, 6:48am

I am deploying a model after training. But in the flask, I am doing it like below,

def seg_model():
    global model
    json_file = open("modelJ.json", 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    model = model_from_json(loaded_model_json)
    model.load_weights("modelWeight.h5")
    
    
if __name__ == '__main__':
    seg_model() 
    app.run(debug=True)

When it’s starting the server it takes time which I think is ok. But further when I call the model.predict for the first time it takes time for loading like below,

2021-12-05 12:45:08.336852: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-12-05 12:45:13.575012: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-12-05 12:45:32.260849: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-12-05 12:45:32.740390: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-12-05 12:45:33.551634: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-12-05 12:45:43.796617: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll

How can I overcome this loading time when predicting for the first time?

Tanya · February 7, 2024, 5:03pm

@MSI Welcome to Tesnorflow Forum!

Below are some of the approached you can try. Let us know if any of the below approach works well for your usecase:

Avoid defining seg_model() as a function. Instead, load the model outside any function and make it globally accessible. This ensures the model is loaded only once, improving server startup performance.
The JSON/HDF5 approach is discouraged for production due to potential compatibility issues.
Save your trained model in the TensorFlow SavedModel format, which is self-contained and widely supported. Use model.save('path/to/saved_model') during training to export the model in this format.
For large-scale deployments or complex models, consider TensorFlow Serving. It offers various benefits like model versioning, scalability, and efficient serving.
If Required, Refer to the TensorFlow Serving documentation for details: Serving Models | TFX | TensorFlow

Let us know what work well for you.

Topic		Replies	Views
Fastest way to load_model for inference Keras models , tfx , keras , help_request	4	4504	November 20, 2021
How to properly deploy Keras models for inference in Python? General Discussion models , keras , help_request	7	2148	March 31, 2022
Why `tf.keras.applications` is so slow? General Discussion api , keras , performance , help_request	1	870	July 7, 2021
Error when using TFLite interpreter in Flask General Discussion tflite , help_request	39	5685	September 16, 2022
Load saved model General Discussion models	2	506	August 15, 2023

How to load model for deployment properly?

Related topics