How can I extract information from a website using gemini api

I am trying to use Google API Flash 1.5 to interpret a URL, so the model identifies (in percentage) how much that page addresses the topic of a specific maker project. I used the following prompt:

response = model.generate_content(f"What percentage of the content is dedicated to a specific project on robotics or maker activities. Study: {study_input}. IMPORTANT: your answer should be a percentage indicating how much the content discusses educational robotics activities or a specific maker project; do not consider pages that compile multiple projects. If it is exactly this type of content, it should be 100%. If it’s other content but mentions this theme at certain points, provide a percentage indicating how much the context discusses the theme, from 0% to 100%. IMPORTANT: Your answer should only be the percentage value.")

The problem is that, when the page is a compilation of studies, like Pinterest, which brings together several project titles but not one specific project, the model is returning 100%.

Welcome to the forum.
I suspect that several years of web development focused on increasing SEO, muddies the waters when attempting to discern relevance as you are now. This will likely be a fun and challenging project.

Your main lever is your prompt. Give the model more guidance on how to score. Few-shot prompting can make a big difference: show it pages including the Pinterest page and the score you would want it to score as examples within the prompt before asking the question you are trying to get an answer to.

Hope that helps.