Bulk image processing using the gemini API, Concrete way to reference the images

Sam_Putnam · April 4, 2025, 8:53pm

So looking at the link below I see that you are supposed to reference each image via an index. But I cant figure out how to do that in a way where I can be certain that each description is accurate.

I want the json array I return to reference an attribute or an index or something so that I can be certain the the description generated matches the image and it’s not hallucinating the order.

My code looks like this:

$json = ‘{
“contents”:[
{
“parts”:[
{
“inline_data”: {
“mime_type”:“image/jpeg”,
“data”: "’ . $imageData . ‘"
}
},
{
“inline_data”: {
“mime_type”:“image/jpeg”,
“data”: "’ . $imageData2 . ‘"
}
},
{
“inline_data”: {
“mime_type”:“image/jpeg”,
“data”: "’ . $imageData3 . ‘"
}
},
{“text”: “Describe each image. Ignore the webcam feed in the top-right corner, if present. Return a valid JSON object that is an array, with each object in the array having two parameters, url and description, with url being the url for the image provided and description being the description of the image.”},
]
}
]
}’;

$url=“https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=$YOUR_API_KEY”;

Govind_Keshari · April 7, 2025, 5:40am

Hi @Sam_Putnam, Welcome to forum!!

Indexing or numbering are the best practices mentioned to get the desired output from the images. Do you want structured output from each image in json array (like url and description)??

If so, there are two ways : through the clear instruction in the prompt and another one is using config (structured output).

Still the best practice is use indexing and use structured output for better results.

You can go through this doc once.

Thanks

Topic		Replies	Views
Bulk Processing Images Without Batching Gemini API api , gemini-api	3	220	October 25, 2024
How to do batch Inference on Prompt Image pairs with Gemini API without getting errors Gemini API gemini-15 , bug , api	1	284	May 28, 2024
How to tie images to the text parts of a long context? Gemini API gemini-15 , api	5	106	May 27, 2024
How to get structured output for image input Documentation api , structure_output	10	154	April 7, 2025
Error using image and a prompt Google AI Studio gemini-15 , api , models	13	855	December 8, 2024

Bulk image processing using the gemini API, Concrete way to reference the images

Related topics