How to get started analyzing html documents

ChrisTaylorDeveloper · October 7, 2025, 5:49am

I am new to ML. How can I get started please, solving this problem with TF / Keras:

I have thousands of html documents, each containing information about a university course. Somewhere in the document will be the name(s) of the course tutor(s). This is the detail I need to extract from each document.

I specifically want to use AI for this task. I want to build a model which, given a html document, will find in that document the tutor name.

How can I get started with such a task? Thank you.

jasmine_dhantule · October 15, 2025, 6:45am

Hi @ChrisTaylorDeveloper, Welcome to the Forum !

Do you have a labeled dataset where each document has annotated tutor names? This is key for training a supervised ML model. If you don’t already have such annotations, you will need to create them. Each document should have a label indicating where the tutor names are in the document. You can refer Named entity Recognition (NER) tutorial which is similar to your task.

Topic		Replies	Views
Extract data/snippets from text General Discussion nlp , learning , help_request	6	2925	July 27, 2023
10_000 samples & 10_000 labels General Discussion nlp , help_request	4	708	August 2, 2021
Determining topics in text documents TensorFlow getting_started , help_request	4	1045	July 21, 2022
Help getting started with a personal project General Discussion models , getting_started , education	8	1486	January 13, 2022
Classify Document Type General Discussion models , getting_started , help_request	1	1765	May 2, 2022

How to get started analyzing html documents

Related topics