Properly summarizing a website with images

Hello, I want to build a Gemini-API-based summarizer for websites, which is different from all the other summarizers, which do a “poor” job (IMHO). The differences I want to implement:

  • I will visit the website with a browser, so that dynamically-generated HTML is also considered by the summary (other solutions just do a “GET ”, and that’s also what Gemini’s URL context feature does)
  • I want Gemini to also read and interpret images (others don’t do it, no idea why, images do contain relevant content, especially in technical articles!)
  • In the generated summary, I want Gemini to not just generate text, but the summary should also include the links to the most relevant images of the website

I’m aware that this means that I have to build my own implementation that scrapes the website’s text and images. However, I’m unsure how exactly the generate_content() call should look for such a kind of multi-modal input. The docs show a brief example, which doesn’t help me much. I would need to feed Gemini with a (long) contents list where text and images take turns (in the order as they appear on the website). And somehow I would need to be able to provide Gemini with some context for each image (e.g., the alt-text and the URL, so that Gemini can reference the URL in output).

Any suggestions?

Check out https://ai.google.dev/gemini-api/docs/image-understanding as that gives an example of using multiple images in the same contents prompt. I would look at using multiple parts (of one content block) rather than spread over turns (seperate contents).

Also, if you are using multiple “artifacts” (text and pictures) … I would probably break down the problem … describe the pictures and the text summary … in seperate calls … and concatanate (with a new generateContents call) the text summarys (of both text and pictures).

For the actual structure required, I worked with a Gemini agent to work thorugh the requirement and code structure, took me about a day but was well worth it :slight_smile:

Good luck.

Thanks, I’ll try your ideas out.