How to get started analyzing html documents

I am new to ML. How can I get started please, solving this problem with TF / Keras:

I have thousands of html documents, each containing information about a university course. Somewhere in the document will be the name(s) of the course tutor(s). This is the detail I need to extract from each document.

I specifically want to use AI for this task. I want to build a model which, given a html document, will find in that document the tutor name.

How can I get started with such a task? Thank you.

Hi @ChrisTaylorDeveloper, Welcome to the Forum !

Do you have a labeled dataset where each document has annotated tutor names? This is key for training a supervised ML model. If you don’t already have such annotations, you will need to create them. Each document should have a label indicating where the tutor names are in the document. You can refer Named entity Recognition (NER) tutorial which is similar to your task.