Hi there! I discovered the MoveNet pose estimation library a few days ago, and I’ve been following these tutorials to get the hang of the model:
https://www.tensorflow.org/hub/tutorials/movenet
https://www.tensorflow.org/lite/tutorials/pose_classification
Now the first of those was pretty straightforward, and I could successfully add a few of my training images to check if the model was good for my purposes. This one uses the TF4 model that outputs a numpy array with the keypoints found in the image.
The problem I have is with the latter. It seems also pretty straightforward, but all of the pose classification scripts treat the output of the network (keypoint_with_scores) as a numpy array, while using a TFLite version of the model that uses a Person class (which is a NamedTuple) as the output of the model. I tried to bypass this by retrieving the keypoint values of the Person class, but found out that the keypoints are also coded as a list of NamedTuples.
I’ve been trying to find more info about this mismatch between the TF4 and the TFLite models but couldn’t find anything, so I’m posting here to see if someone has any idea of why the outputs of the two models are so different.