Hi all,
I am performing Cloud Cost Optimization on AWS and am currently exploring deployment of TFLite models on AWS t4g instances which use AWS custom ARM chip - Graviton2. They pack ~2x performance for 1/2 cost when compared to t3 instances. I have created a basic deployment container for InceptionNet-v3 pretained on Image-Net (no fine-tuning) using FastApi.
InceptionNet-v3 tflite model is quantized to float16. I have shared the deployment code.
import tflite_runtime.interpreter as tflite
import numpy as np
from PIL import Image
import copy
def load_model(model_path):
'''
Used to load the model as a global variable
'''
model_interpreter = tflite.Interpreter(model_path)
return model_interpreter
def infer(image, model_interpreter):
'''
Runs inference on the input image
'''
model_interpreter.allocate_tensors()
input_details = model_interpreter.get_input_details()[0]
output_details = model_interpreter.get_output_details()[0]
input = model_interpreter.tensor(input_details["index"])
output = model_interpreter.tensor(output_details["index"])
image = image.resize(input_details["shape"][1:-1])
image = np.asarray(image, dtype=np.float32)
image = np.expand_dims(image, 0)
image = image/255
input()[:] = image
model_interpreter.invoke()
results = copy.deepcopy(output())
return results
When I run my load testing program (written using Locust
- a python load testing package), then I don’t see any error till the user count breaches ‘3’. After that I start seeing Runtime error given below:
RuntimeError: There is at least 1 reference to internal data
in the interpreter in the form of a numpy array or slice. Be sure to
only hold the function returned from tensor() if you are using raw
data access.
I don’'t see the error for every request. It is random in nature.
When I placed the model loading codelines inside infer
function. I stopped seeing this error. But, this will load a new model everytime. This is fine while the model size is less. But, after sometime, this will contribute a good amount towards the inference time.
Is there a way to solve this random error while having the TFLite model as a global variable?
Thanks.
EDIT:
I tried out signature runners
api. I didn’t get the error but, for some reason my docker container kept on exiting without any error. I think, it was because of lack of memory.
Does signature runner
create multiple interpreter instances in different threads when the current intrepreter is being run thus using up all the available RAM?