How can I extract information from a website using gemini api

Talysson_Emanoel · October 30, 2024, 11:57am

I am trying to use Google API Flash 1.5 to interpret a URL, so the model identifies (in percentage) how much that page addresses the topic of a specific maker project. I used the following prompt:

response = model.generate_content(f"What percentage of the content is dedicated to a specific project on robotics or maker activities. Study: {study_input}. IMPORTANT: your answer should be a percentage indicating how much the content discusses educational robotics activities or a specific maker project; do not consider pages that compile multiple projects. If it is exactly this type of content, it should be 100%. If it’s other content but mentions this theme at certain points, provide a percentage indicating how much the context discusses the theme, from 0% to 100%. IMPORTANT: Your answer should only be the percentage value.")

The problem is that, when the page is a compilation of studies, like Pinterest, which brings together several project titles but not one specific project, the model is returning 100%.

OrangiaNebula · October 30, 2024, 4:30pm

Welcome to the forum.
I suspect that several years of web development focused on increasing SEO, muddies the waters when attempting to discern relevance as you are now. This will likely be a fun and challenging project.

Your main lever is your prompt. Give the model more guidance on how to score. Few-shot prompting can make a big difference: show it pages including the Pinterest page and the score you would want it to score as examples within the prompt before asking the question you are trying to get an answer to.

Hope that helps.

Topic		Replies	Views
Why do Gemini 2.0 Flash models struggle parsing information from a html page Google AI Studio gemini-flash , gemini-20	5	304	February 13, 2025
Using Grounding with Apps Script Google AI Studio gemini-15 , api	7	303	December 11, 2024
How to tell if a web search actually happened when using Grounding with Web Search Gemini API ground-search , grounding	4	104	May 4, 2025
Limiting Gemini 2.0 Pro grounding searches to a specific domain? Gemini API api	2	141	May 11, 2025
How to improve gemini-1.5-flash output accuracy on images Gemini API gemini-15 , model	3	113	September 12, 2024

How can I extract information from a website using gemini api

Related topics