Help needed: I am planning to build a question bank system using an LLM (Large Language Model). The idea is to upload photos of students’ practice exercises into the system. However, I’ve encountered a problem: in subjects like math, where questions often include geometric diagrams or other images, I don’t know how to extract information from these images into the database.
I would like to know whether it’s possible to use LLMs to process these images. Ideally, I want to keep the original images so that they can be included later when generating practice materials (such as PDFs or other formats).
I don’t have a programming background and plan to rely entirely on LLMs to assist with the development. I hope to receive some advice or guidance on how to proceed.