Bulk image processing using the gemini API, Concrete way to reference the images

So looking at the link below I see that you are supposed to reference each image via an index. But I cant figure out how to do that in a way where I can be certain that each description is accurate.

I want the json array I return to reference an attribute or an index or something so that I can be certain the the description generated matches the image and it’s not hallucinating the order.

My code looks like this:

$json = ‘{
“contents”:[
{
“parts”:[
{
“inline_data”: {
“mime_type”:“image/jpeg”,
“data”: "’ . $imageData . ‘"
}
},
{
“inline_data”: {
“mime_type”:“image/jpeg”,
“data”: "’ . $imageData2 . ‘"
}
},
{
“inline_data”: {
“mime_type”:“image/jpeg”,
“data”: "’ . $imageData3 . ‘"
}
},
{“text”: “Describe each image. Ignore the webcam feed in the top-right corner, if present. Return a valid JSON object that is an array, with each object in the array having two parameters, url and description, with url being the url for the image provided and description being the description of the image.”},
]
}
]
}’;

$url=“https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=$YOUR_API_KEY”;

Hi @Sam_Putnam, Welcome to forum!!

Indexing or numbering are the best practices mentioned to get the desired output from the images. Do you want structured output from each image in json array (like url and description)??

If so, there are two ways : through the clear instruction in the prompt and another one is using config (structured output).

Still the best practice is use indexing and use structured output for better results.

You can go through this doc once.

Thanks