Fastest way to load_model for inference

joanfihu · November 12, 2021, 5:14pm

Hi there,

I’m trying to quickly load a model to make predictions in a REST API. The tf.keras.models.load_model method takes ~1s to load so it’s too slow for what I’m trying to do.

What is the fastest way to load a model for inference only?

I know there is a TFX serving server to just do this efficiently but I’ve already have a REST API for doing other things. Setting up a specialised server just for predictions feels like an overkill. How is the TFX server handling this?

Thanks in advance,
Joan

Bhack · November 13, 2021, 2:42am

Setting up a specialised server just for predictions feels like an overkill.

I don’t think that it is overkill and probably the alternatives are not so simpler than Tensorflow Serving with Docker
E.g. see:

joanfihu · November 13, 2021, 8:03am

Yes there might be no other way. However, I’m not sure if the TFX server loads a model from disk in every request?

What I’m trying to achieve is to either find a very quick way to load a model from disk or keep the model in memory somehow so it doesn’t need to be loaded in every request.

I also tried caching but pickle deserialisation is very expensive and adds ~1.2s. I suspect the built-in load model does some sort of serialisation too, which seems to be the killer.

robertcrowe · November 16, 2021, 11:25pm

I think that you want to use TensorFlow Serving. If your model is small enough to keep in memory, it will. You can also do SavedModel Warmup.

joanfihu · November 20, 2021, 1:49pm

Hey Robert, thanks for your reply. Yes, I ended up loading the model in memory on server initialisation. It’s a small model so works well this way.

Topic		Replies	Views
How to load model for deployment properly? General Discussion models , keras , help_request	1	1601	February 7, 2024
How to properly deploy Keras models for inference in Python? General Discussion models , keras , help_request	7	2152	March 31, 2022
Caching model from tensorflow hub General Discussion models , tensorflow	9	583	August 23, 2023
How to deploy tf-serving for maximum throughput for inference on metal and kubernetes? General Discussion tf-serving	1	1233	September 19, 2024
Embeding the TensorFlow model vs asking to TensorFlow serving server General Discussion serving	0	286	July 20, 2023

Fastest way to load_model for inference

Related topics