Properly summarizing a website with images

MShekow · November 23, 2025, 7:25am

Hello, I want to build a Gemini-API-based summarizer for websites, which is different from all the other summarizers, which do a “poor” job (IMHO). The differences I want to implement:

I will visit the website with a browser, so that dynamically-generated HTML is also considered by the summary (other solutions just do a “GET ”, and that’s also what Gemini’s URL context feature does)
I want Gemini to also read and interpret images (others don’t do it, no idea why, images do contain relevant content, especially in technical articles!)
In the generated summary, I want Gemini to not just generate text, but the summary should also include the links to the most relevant images of the website

I’m aware that this means that I have to build my own implementation that scrapes the website’s text and images. However, I’m unsure how exactly the generate_content() call should look for such a kind of multi-modal input. The docs show a brief example, which doesn’t help me much. I would need to feed Gemini with a (long) contents list where text and images take turns (in the order as they appear on the website). And somehow I would need to be able to provide Gemini with some context for each image (e.g., the alt-text and the URL, so that Gemini can reference the URL in output).

Any suggestions?

Duncan_Smith · November 23, 2025, 8:27am

Check out https://ai.google.dev/gemini-api/docs/image-understanding as that gives an example of using multiple images in the same contents prompt. I would look at using multiple parts (of one content block) rather than spread over turns (seperate contents).

Also, if you are using multiple “artifacts” (text and pictures) … I would probably break down the problem … describe the pictures and the text summary … in seperate calls … and concatanate (with a new generateContents call) the text summarys (of both text and pictures).

For the actual structure required, I worked with a Gemini agent to work thorugh the requirement and code structure, took me about a day but was well worth it

Good luck.

MShekow · November 24, 2025, 6:33am

Thanks, I’ll try your ideas out.

Topic		Replies	Views
How to tie images to the text parts of a long context? Gemini API gemini-15 , api	5	190	May 27, 2024
Reading websites with images Google AI Studio gemini-15	7	265	November 3, 2025
Blank imgur images within the gemini response text Gemini API	2	453	August 21, 2024
How to make a conversation with Gemini that supports pictures Gemini API ai	2	316	December 3, 2024
Non-Coder Seeking Advice: How to Use LLMs to Extract Math Questions (with Images) into a Database Gemini API model-code , llm	2	107	June 25, 2025

Properly summarizing a website with images

Related topics