Questions about choice of model and method, annotations and PDF in general

Hi,

To start of, English is not my native tongue so if there is something that I should rephrase or try to further explain, please just let me know.

I would like to create a application of some sort where I can load a PDF file consisting of lines and numbers and some other chapes and have them “identified” to be able to withdraw information from them.

I have, by using my kids crappy pencils, tried to create something that resembles what i´m looking to create. On the top there is the PDF from start and on the bottom there is different colors for different kinds of objects.

I have used tensorflow models before to identify objects in videos but never on a PDF.
So, the perfect solution for me, if it exists, is to train a model to recognise the individual lines and the individual numbers and then also get the position of the lines and numbers so that i could “bind” theese two/three together. I would also like to train the model on lines that cross eachother and how the model should know what line to connect with what line in that crossing (the lines always go straight through a crossing but there are crossings).

  1. Do you think something like this is possible?

and now for my collection of unsorted, probably kind of stupid and hard to answer questions;

  1. I would also like to make some sort of ROI, in this case in the bottom, as there is the information about the current game/object that is being analyzed. Could this be done? could you use multiple ROI´s?

  2. To be able to get good positions from lines I guess instance segmentation is the way to go? does anyone have any ideas of models to use or annotation teqnuices to look into?

  3. I´m planning to do this in python and to build a simple GUI with some buttons like load a new document, quit and so. Does anyone have any recommendations for a GUI that works well with this?

Any information and any ideas are more than welcome, not only answers to the questions. This is still just a idea that I think about a bit to much not to at least try to find out if its possible but the scope is still very far from set.

Best,

Martin

Hi @Martin_L,

Sorry for the delayed response. It’s an interesting idea and is possible. For your use case first convert PDF into images using pdf2image python library and then to recognize individual lines, numbers and shapes you can use object detection model. Please refer to this tutorial.

  1. As object detection models requires annotated data, first get the annotations of your desired objects(lines, numbers and shapes, cross lines, etc ) using annotation tools.
  2. Load the object detection model using the annotated data(json/xml) and train the model.
  3. The Region Proposal Network will generate many RoIs around possible Objects. Based on the IoU score candidate RoI are only processed to next stage.
  4. Python GUI Tkinter can be helpful.

Thank You