Error when using TFLite interpreter in Flask

MSI · October 11, 2021, 8:51am

I have quantized model with float32. After making tflite model it’s predicting perfectly with single image but when using in while loop it’s showing an error. I tried to follow the instruction of TensorFlow here but didn’t understand their way.

CODE:

def generate_frames(frame):
    while True:

        image = cv2.resize(frame,(256,256))

        #converting into float32
        image = tf.image.convert_image_dtype((image/255.0), dtype=tf.float32).numpy()

        image = run_inference(np.expand_dims(image[:,:,:3], axis=0)) 

        final_result = (image*255).astype(np.uint8)
        
        ret,buffer=cv2.imencode('.jpg',final_result)

        frame=buffer.tobytes()

        return frame


#load model
def load_trained_model():
    global interpreter, input_details, output_details
    interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()


def run_inference(image):
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]

    return outputs

if __name__ == '__main__':
    load_trained_model()    
    app.run(debug=True)

ERROR:

RuntimeError: There is at least 1 reference to internal data in the
interpreter in the form of a NumPy array or slice. Be sure to only
hold the function returned from tensor() if you are using raw data
access.

George · October 11, 2021, 1:07pm

Hi @MSI

Maybe tf.expand_dims instead of np.expand_dims?
Also inside generate_frames maybe you want to return image instead of frame?

Best

MSI · October 11, 2021, 1:15pm

@George_Soloupis Edited the part (returning as uint8) and with tf.expand_dims or np.expand_dims nothing changing. Same problem are just happing.

George · October 11, 2021, 1:21pm

So, without while loop is it working OK?

MSI · October 11, 2021, 2:56pm

@George_Soloupis if i simply use like this ,

cv2.namedWindow("preview")
cap = cv2.VideoCapture(0)
while (True):
    _, frame = cap.read()
    image  = cv2.resize(frame,(256,256)) 
    image  = cv2.cvtColor(image , cv2.COLOR_BGR2RGBA)
    image  = (image /255.0).astype(np.float32)
    final_result  = run_inference(np.expand_dims(image[:,:,:3], axis=0))
    cv2.imshow("preview", final_result)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyWindow("preview")

It’s working fine but when using it in flask it’s showing an error !!

MSI · October 11, 2021, 4:19pm

Just debugged random things! If I use the tflite prediction without any external function directly then it works fine.

def generate_frames(frame):
    image = cv2.resize(frame,(256,256))

    #converting into float32
    image = tf.image.convert_image_dtype((image/255.0), dtype=tf.float32).numpy()
    
    #prediction
    -----------------
    interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    interpreter.set_tensor(input_details[0]['index'], np.expand_dims(image[:,:,:3], axis=0))
    interpreter.invoke()
    image = interpreter.get_tensor(output_details[0]['index'])[0]
    -----------------
    
    final_result = (image*255).astype(np.uint8)
        
    ret,buffer=cv2.imencode('.jpg',final_result)

    frame=buffer.tobytes()

    return frame

isn’t it too memory-consuming and a bad way to use a model prediction?

George · October 11, 2021, 4:22pm

Improve from that point that it works…like take out the init of the interpreter

MSI · October 11, 2021, 6:46pm

But why its happing ? why TFLite doesn’t support calling like below?

interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

def generate_frames(frame):

    image= cv2.resize(frame,(256,256))
    image= cv2.cvtColor(image, cv2.COLOR_BGR2RGBA)

    #converting into float32
    image= (image/255.0).astype(np.float32)

    #prediction
    image= run_inference(np.expand_dims(image[:,:,:3], axis=0)) # <<< problem happens here

            
    final_result = (image*255).astype(np.uint8)
            
    ret,buffer=cv2.imencode('.jpg',final_result)

    frame=buffer.tobytes()

    return frame



def run_inference(image):
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]
    return outputs

Bhack · October 11, 2021, 11:01pm

I suppose it is caused by:

MSI · October 12, 2021, 2:47am

@Bhack Updated the ques title.

I had seen that. The point is we need to have no NumPy arrays pointing to internal buffers, we have to clear them.

Their solutions are reloading the notebook or the model. And both solutions are not worthy in my case.

Bhack · October 12, 2021, 3:16am

Have you checked the internal test:

github.com

tensorflow/tensorflow/blob/master/tensorflow/lite/python/interpreter_test.py#L388


      
                model_path=resource_loader.get_path_to_datafile(
                    'testdata/permute_float.tflite'))
            self.interpreter.allocate_tensors()
            self.input0 = self.interpreter.get_input_details()[0]['index']
            self.initial_data = np.array([[-1., -2., -3., -4.]], np.float32)
          
          def testTensorAccessor(self):
            """Check that tensor returns a reference."""
            array_ref = self.interpreter.tensor(self.input0)
            np.copyto(array_ref(), self.initial_data)
            self.assertAllEqual(array_ref(), self.initial_data)
            self.assertAllEqual(
                self.interpreter.get_tensor(self.input0), self.initial_data)
          
          def testGetTensorAccessor(self):
            """Check that get_tensor returns a copy."""
            self.interpreter.set_tensor(self.input0, self.initial_data)
            array_initial_copy = self.interpreter.get_tensor(self.input0)
            new_value = np.add(1., array_initial_copy)
            self.interpreter.set_tensor(self.input0, new_value)
            self.assertAllEqual(array_initial_copy, self.initial_data)

MSI · October 12, 2021, 8:43am

@Bhack Thanks for the source. As far I understood we need to delete internal buffer after each iteration. From the interpreter_test.py we need to perform “del in0” operation. But I am confused about how to perform it? Can you give me a hint?

interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

def run_inference(image):
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]

    **I think, need to perform the buffer delete operation here** (but how ?)

    return outputs

Bhack · October 12, 2021, 2:01pm

If you see many of these operation has a safety guard, you can find the check description here:

github.com

tensorflow/tensorflow/blob/master/tensorflow/lite/python/interpreter.py#L516-L542


      
                experimental_default_delegate_latest_features,
            )
          elif not model_content and not model_path:
            raise ValueError('`model_path` or `model_content` must be specified.')
          else:
            raise ValueError('Can\'t both provide `model_path` and `model_content`')
          
          # Each delegate is a wrapper that owns the delegates that have been loaded
          # as plugins. The interpreter wrapper will be using them, but we need to
          # hold them in a list so that the lifetime is preserved at least as long as
          # the interpreter wrapper.
          self._delegates = []
          if experimental_delegates:
            self._delegates = experimental_delegates
            for delegate in self._delegates:
              self._interpreter.ModifyGraphWithDelegate(
                  delegate._get_native_delegate_pointer())  # pylint: disable=protected-access
          self._signature_defs = self.get_signature_list()
          
          self._metrics = metrics.TFLiteMetrics()

This file has been truncated. show original

I don’t think that the problem is on set_tensor and get_tensor as they are the slow (copy) API instead of tensor().

Have you tried if holding input_details and output_details is going to be similar to the WRONG pattern explained at:

This could also clarify why probably it was working when you tried with all the code gist in a single function as these references were confined to the function scope.

Bhack · October 12, 2021, 2:49pm

@MSI Can you modify this very minimal Colab gist for your use case as I cannot reproduce your error in this minimal context:

Bhack · October 12, 2021, 9:01pm

I’ve just comment the GPU lines as I don’t have a spare GPU currently:

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

And uncommented:

# interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
# interpreter.allocate_tensors()
# input_details = interpreter.get_input_details()
# output_details = interpreter.get_output_details()

But I don’t see any error message with TF 2.6

MSI · October 13, 2021, 3:02am

@Bhack Did you commented the lines in run_inference() . I updated the github. If you run it now you will see the error.

def run_inference(image):
#     interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
#     interpreter.allocate_tensors()
#     input_details = interpreter.get_input_details()
#     output_details = interpreter.get_output_details()
    # perform inference and parse the outputs
    interpreter.set_tensor(input_details[0]['index'], image)
    interpreter.invoke()
    outputs = interpreter.get_tensor(output_details[0]['index'])[0]

    return outputs

Bhack · October 13, 2021, 3:25am

Oh now I see, I suppose that the problem is that your thinking about a standard python file but this is not the same in flask.

You need to use something like this to “store” your global objects in the app context (interpeter, input_details, output_details):

Bhack · October 13, 2021, 4:09am

P.s. If it is still slow as you need to load and recreate the interpreter as its lifecycle end on each request you could try to run TF Serving instance and consume it with Flask:

MSI · October 13, 2021, 10:18am

@Bhack That’s a great hint! It worked.

def run_inference(image):
    g.interpreter.set_tensor(g.input_details[0]['index'], image)
    g.interpreter.invoke()
    outputs = g.interpreter.get_tensor(g.output_details[0]['index'])[0]
    return outputs

@app.before_request
def load_model():
    g.interpreter = tf.lite.Interpreter(model_path="quant_model.tflite")
    g.interpreter.allocate_tensors()
    g.input_details = g.interpreter.get_input_details()
    g.output_details = g.interpreter.get_output_details()

But truly said it seems, taking time as same as loading model every time.

Bhack · October 13, 2021, 11:16am

Yes It Is better that you interface flask with the TF serving instance as in the example.

Topic		Replies	Views
Problems in Inferencing TFLITE Model in Rest API General Discussion tflite , help_request , tensorflow	8	888	February 9, 2023
Issues in deciphering TensorFlowLite Interpreter output General Discussion models , android , tflite , help_request	10	1330	June 22, 2021
How to make TFLite handle multiple api calls at simultaneously? General Discussion tflite , aws , help_request	1	1318	September 11, 2024
Error when trying to run inference using TFlite General Discussion models , tflite , help_request	5	6075	October 23, 2021
Detect multiple objects General Discussion models	4	1436	August 24, 2021

Related topics