How to make TFLite handle multiple api calls at simultaneously?

Prikmm · March 9, 2022, 11:13am

Hi all,

I am performing Cloud Cost Optimization on AWS and am currently exploring deployment of TFLite models on AWS t4g instances which use AWS custom ARM chip - Graviton2. They pack ~2x performance for 1/2 cost when compared to t3 instances. I have created a basic deployment container for InceptionNet-v3 pretained on Image-Net (no fine-tuning) using FastApi.

InceptionNet-v3 tflite model is quantized to float16. I have shared the deployment code.

import tflite_runtime.interpreter as tflite
import numpy as np
from PIL import Image
import copy

def load_model(model_path):
    '''
    Used to load the model as a global variable
    '''
    model_interpreter = tflite.Interpreter(model_path)   
    return model_interpreter

def infer(image, model_interpreter):
    '''
    Runs inference on the input image
    '''
    model_interpreter.allocate_tensors()
    input_details = model_interpreter.get_input_details()[0]
    output_details = model_interpreter.get_output_details()[0]
    input = model_interpreter.tensor(input_details["index"])
    output = model_interpreter.tensor(output_details["index"])
 
    image = image.resize(input_details["shape"][1:-1])
    image = np.asarray(image, dtype=np.float32)
    image = np.expand_dims(image, 0)
    image = image/255
   
    input()[:] = image
    model_interpreter.invoke()
    results = copy.deepcopy(output())
    
    return results

When I run my load testing program (written using Locust - a python load testing package), then I don’t see any error till the user count breaches ‘3’. After that I start seeing Runtime error given below:

RuntimeError: There is at least 1 reference to internal data
      in the interpreter in the form of a numpy array or slice. Be sure to
      only hold the function returned from tensor() if you are using raw
      data access.

I don’'t see the error for every request. It is random in nature.

When I placed the model loading codelines inside infer function. I stopped seeing this error. But, this will load a new model everytime. This is fine while the model size is less. But, after sometime, this will contribute a good amount towards the inference time.

Is there a way to solve this random error while having the TFLite model as a global variable?

Thanks.

EDIT:
I tried out signature runners api. I didn’t get the error but, for some reason my docker container kept on exiting without any error. I think, it was because of lack of memory.

Does signature runner create multiple interpreter instances in different threads when the current intrepreter is being run thus using up all the available RAM?

Rahul_Gaikwad · September 11, 2024, 4:17pm

Hi, @Prikmm

I apologize for the delayed response, if you still need help on this issue could you please run the performance benchmark on the model with the --enable_op_profiling=true? It will give much more details.

Please refer official documentation for performance best practices and to confirm, did you try workaround suggested in this stackoverflow thread, if not please give it try once and see is it resolving your issue or not ?

If issue still persists after trying with latest version please let us know with more information and with more information to investigate this issue further from our end ?

Thank you for your cooperation and patience.

Topic		Replies	Views
Using tflite_runtime on raspberry pi 4 General Discussion models , tflite , raspberry_pi	1	973	January 3, 2023
Error when using TFLite interpreter in Flask General Discussion tflite , help_request	39	5689	September 16, 2022
Problems in Inferencing TFLITE Model in Rest API General Discussion tflite , help_request , tensorflow	8	888	February 9, 2023
Tensorflow lite Interpreter error General Discussion models , tflite , help_request	1	2649	September 19, 2022
Error when trying to run inference using TFlite General Discussion models , tflite , help_request	5	6075	October 23, 2021

How to make TFLite handle multiple api calls at simultaneously?

Related topics