I’m deploying a big Keras model to production and it’s not very clear to me if I should do anything to it to make it more efficient for inference. In tf-v1 I used to prepare frozen graphs, but this is being deprecated in tf-v2 (as far as I understand).
Right now, I’m just loading the model and using the .predict() method to perform inference on .tfrec files.
from tensorflow import keras
model = keras.models.load_model('path/to/location')
model.predict(get_dataset(tfrecord_list, batch_size), verbose=0)
This model won’t be trained anymore, so I wanted to understand if there are any production-specific steps that I should do to increase the model computational performance (if possible).
HI @Kiran_Sai_Ramineni. TensorFlow serving is aimed towards web services, right? My model will be distributed within a Python package, so I’m not sure if that’s the way to go
InvalidArgumentError: Graph execution error:
Detected at node 'StatefulPartitionedCall' defined at (most recent call last):
…
I’m not posting the whole log here because that’s out of topic. I’m using a complex custom layer that is probably causing this. I’ll try to figure out the root of the problem.