Hello community!
I retrained a faster rcnn implementation with segmented images of student answers from a multiple-choice answer sheet. I have five clases (A,B,C,D and X for empty or invalid anwers). I used the detected bounding boxes to draw new boxes for answer number recognition using tesseract then rejected invalid text and used linear regression to remove noise.
I also want the smallest architecture possible because I want to use it on a mobile device for real time scoring and also because I figure it is a relatively easy problem. That is why I skipped YOLO architectures. I was able to run the object detection app on my phone using the TensorFlow Lite object detection example using Xcode in my mac and building in my iphone. I also found in the TFHub the faster-rcnn-cppe5-lite for TFLite with an input layer size of (800 x 1216).
Currently, using scanned sheets I am achieving an the highest accuracy on unseen data using a 1024x1024x3 input layer. Only one wrong answer in 17 sheets of 40 answers (1 in 680 predictions). I am currently using scanned sheets and I assume that accuracy will go down when I use pictures of the sheets.
Before I move forward my question is: what is the most robust way to solve this image vision problem using deep-learning? Am I in the right direction?
Thank you,
NA Parra