How to do batch Inference on Prompt Image pairs with Gemini API without getting errors

Hey all, I’m new to using the Gemini API and want to use it to generate better product descriptions given an image and a prompt which contains the instructions and relevant product info like Title, Description, Price,Location etc. , For this use case I thought using gemini 1.5 flash would be best suited for that purpose as I can Fit in 100s or 1000s of Images and prompts at a single time, since making thousands of requests is slow, but I could only pass 2 image-prompt without getting any errors, when I add 3 Images and 3 Prompts I get this error, I tried finding relevant documentation to figure out the error or examples demonstrating, but couldn’t find anything relevant. It will be very helpful if any one could suggest or implement a fix to this issue, if you have any alternative ways of doing this please show how it can be done, thanks.

response:

GenerateContentResponse(

done=True,

iterator=None,

result=glm.GenerateContentResponse({

"candidates": [

{

"finish_reason": 4,

"index": 0,

"safety_ratings": [],

"token_count": 0,

"grounding_attributions": []

}

]

}),

)

Code for recreating the issue (scroll to see the full code)

import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
generation_config = {
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 50,
  "max_output_tokens":  1000000,
  "response_mime_type": "application/json",
}
safety_settings = [ {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_NONE",
  },
  {
    "category": "HARM_CATEGORY_HATE_SPEECH",
    "threshold": "BLOCK_NONE",
  },
  {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_NONE",
  },
  {
    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
    "threshold": "BLOCK_NONE",
  },]
instruction='I have provided the following Prompt and Image Pair, For which you need to output a valid json object which must contain these types of responses: Structured Key-Value Pairs named as structured_response and a Descriptive Narrative named as descriptive_response, Make sure to include all the attributes you found in structured_response as well as in the prompt and the image to be also described in descriptive_response, dont output "null" or "N/A" values in structured_response, the resulting json should look like {"structured_response": {"all the key-value pairs"},"descriptive_response":"descriptive response"}'

# Generate Prompts for all the items in the pq file
import pandas as pd
from PIL import Image
import io

# Function to convert bytes to PIL image
def bytes_to_pil(image_bytes):
    return Image.open(io.BytesIO(image_bytes))

# Loop through the DataFrame and create detailed prompts
for index, row in clean_df.iterrows():

    title = row['title']
    description = row['description']
    price = str(row['price']) + ' '+row['currency_code']
    pub_date = row['pub_date']
    num_views = row['num_views']
    category = row['category']
    address = row['address']
    country = row['country']
    region = row['region']
    city = row['city']
    detailed_prompt = (
        f"Generate an exhaustive description of the item or items or a collection based on the image and the following information:\n"
        f"Title: {title}\n"
        f"Description: {description}\n"
        f"Price: {price}\n"
        f"Publication Date: {pub_date}\n"
        f"Number of Views: {num_views}\n"
        f"Category: {category}\n"
        f"Address: {address}\n"
        f"Country: {country}\n"
        f"Region: {region}\n"
        f"City: {city}\n"
        "Features, Brand, Occasion, Room, Usage, Type, Category, Condition, Model, Designer, Price, "
        "Length, Width, Height, Weight, Fabric, Texture, Style Details, Sales Rank, Customer Reviews, "
        "Manufacturer, Assembly Required, Dimensions, Care Instructions, Return Policy, Shipping Weight, "
        "Shipping Dimensions, Origin, Material Details, Manufacturer Warranty, Product Dimensions, "
        "Product Weight, Top Selling Product, Best Seller, Special Offers, Deals, Accessories, Availability, "
        "Number of Items, Recommended Uses, Included Components, Features List, Technical Details, Safety Information, "
        "Ingredients, Nutritional Information, Packaging, Product Details, Target Audience, Gift Wrap Available, "
        "Age Range, Shipping or Pickup, Free Shipping Included,"
        "Address,Country, Region, City,Number of Views (which should indicate popularity), Publication Date,"
        "Additional Information, Other Attributes.\n"
        "Describe the content of your image with as many details as you can, focusing only on the main subject of the image and ignoring all irrelevant parts. "
        "Also, describe what the item is used for, its condition, and any visible signs of damage. If there is any visible text or branding related to the item, include it in the description. "
        "If it is a product, provide its year of release/making. Target audience and age group. Whether the Item is boxed or unboxed, In the box or outside the box. "
        "If it's a collection of items also provide description of all the relevant items in the image."
        "Add in details about to whom this item is a great option for."
        "Add in information about installation of the item, if it should be done by the user or professional help is required"
        "If there are issues with the item's functioning or reliability also mention that."
        "and If any relevant information required for the attributes that isn't provided in this prompt about the item you can infer that from the image or retrieve from your knowledge base and add to the description"
        "If the attributes are null or empty don't output them in structured_response"
    )
    
    # Add the prompt to the DataFrame
    clean_df.at[index, 'prompt'] = detailed_prompt

# Display the DataFrame with the new prompt column
print(clean_df["prompt"][0])

arr = []

# Generate Prompts for all the items in the pq file
import pandas as pd
from PIL import Image
import io

# Function to convert bytes to PIL image
def bytes_to_pil(image_bytes):
    return Image.open(io.BytesIO(image_bytes))

arr[0] = instruction

# Loop through the DataFrame and create detailed prompts
for index, row in clean_df.iterrows():
    # Increasing this to 3 gives an error
    if index < 2:
        prompt = row['prompt']
        image_bytes = row['image_bytes']
        image = bytes_to_pil(image_bytes)
        arr.append([prompt,image])



# Sample of Prompt 3 Image Pairs, 3 is copy of 2
arr = ['I have provided Prompt and Image Pairs, For which you need to output a valid json array with json objects in which every json object must contain these types of responses: Structured Key-Value Pairs named as structured_response and a Descriptive Narrative named as descriptive_response,Make sure to describe all the attributes you found in structured_response as well as in the prompt to be also in descriptive_response and to have as much detail as possible, dont output "null" or "N/A" values in structured_response, the resulting json should look like [{"structured_response": {"all the key-value pairs"},"descriptive_response":"descriptive response"},...rest of the json objects...]',
 "Generate an exhaustive description of the item or items or a collection based on the image and the following information:\nTitle: 1980-86 Ford  8ft bed, tailgate\nDescription: Southern 8ft . F150, 250 ,350  No rust. With tailgate. Rhino lined bed .Great condition $2000 firm calls only  Have bobcat to load  Location Capon bridge wv 26711\nPrice: 2000.0 USD\nPublication Date: 2023-10-26T16:55:29\nNumber of Views: 14\nCategory: Auto parts\nAddress: \nCountry: United States\nRegion: West Virginia\nCity: Capon Bridge\nFeatures, Brand, Occasion, Room, Usage, Type, Category, Condition, Model, Designer, Price, Length, Width, Height, Weight, Fabric, Texture, Style Details, Sales Rank, Customer Reviews, Manufacturer, Assembly Required, Dimensions, Care Instructions, Return Policy, Shipping Weight, Shipping Dimensions, Origin, Material Details, Manufacturer Warranty, Product Dimensions, Product Weight, Top Selling Product, Best Seller, Special Offers, Deals, Accessories, Availability, Number of Items, Recommended Uses, Included Components, Features List, Technical Details, Safety Information, Ingredients, Nutritional Information, Packaging, Product Details, Target Audience, Gift Wrap Available, Age Range, Shipping or Pickup, Free Shipping Included,Address,Country, Region, City,Number of Views (which should indicate popularity), Publication Date,Additional Information, Other Attributes.\nDescribe the content of your image with as many details as you can, focusing only on the main subject of the image and ignoring all irrelevant parts. Also, describe what the item is used for, its condition, and any visible signs of damage. If there is any visible text or branding related to the item, include it in the description. If it is a product, provide its year of release/making. Target audience and age group. Whether the Item is boxed or unboxed, In the box or outside the box. If it's a collection of items also provide description of all the relevant items in the image.Add in details about to whom this item is a great option for.Add in information about installation of the item, if it should be done by the user or professional help is requiredIf there are issues with the item's functioning or reliability also mention that.and If any relevant information required for the attributes that isn't provided in this prompt about the item you can infer that from the image or retrieve from your knowledge base and add to the description",
 <PIL.PngImagePlugin.PngImageFile image mode=RGB size=480x340>,
 "Generate an exhaustive description of the item or items or a collection based on the image and the following information:\nTitle: Candela 2-Piece Sectional - Contemporary Comfort for Your Living Space\nDescription: We offer No Credit Needed Programs, Layaway, Fast Delivery or even Same Day Delivery for a fee! We also accept Cash, Credit Card, Venmo, Cash App, Zelle in stores! Was $2700. and now $853  Call us at (555) 836-7092  Buy Online on our nadia.tanaka135@example.com or Unlike other stores, we have stores on DMV! Visit Our Physical Locations!   Here are all of our addresses:  JMD Furniture     JMD Furniture     JMD Furniture  7702B-D Richmond Hwy.  Alexandria, VA    The Hours are  Monday to Saturday - 10 am to 8 pm  Sunday - 10 am to 6 pm   Explore our diverse furniture and mattress collection! Discover Living Room Sets, Bedrooms, Dining Room Sets, Sectionals, Sofa Sets, Sofa Love Seats, Double Pillow Top Mattresses, Rugs, Carpets, Wall Art, Lamps, Dressers, Chests, Nightstands, Beds, Sleigh Beds, Platform Beds, Headboards, Rails, Box Springs, and a wide selection of other items   *Sofa and Love Seat*         Living Room *Couch and loveseat         Sectional* *Sectional*\nPrice: 853.0 USD\nPublication Date: 2023-11-14T16:51:24\nNumber of Views: 21\nCategory: Furniture\nAddress: \nCountry: United States\nRegion: Maryland\nCity: Mount Rainier\nFeatures, Brand, Occasion, Room, Usage, Type, Category, Condition, Model, Designer, Price, Length, Width, Height, Weight, Fabric, Texture, Style Details, Sales Rank, Customer Reviews, Manufacturer, Assembly Required, Dimensions, Care Instructions, Return Policy, Shipping Weight, Shipping Dimensions, Origin, Material Details, Manufacturer Warranty, Product Dimensions, Product Weight, Top Selling Product, Best Seller, Special Offers, Deals, Accessories, Availability, Number of Items, Recommended Uses, Included Components, Features List, Technical Details, Safety Information, Ingredients, Nutritional Information, Packaging, Product Details, Target Audience, Gift Wrap Available, Age Range, Shipping or Pickup, Free Shipping Included,Address,Country, Region, City,Number of Views (which should indicate popularity), Publication Date,Additional Information, Other Attributes.\nDescribe the content of your image with as many details as you can, focusing only on the main subject of the image and ignoring all irrelevant parts. Also, describe what the item is used for, its condition, and any visible signs of damage. If there is any visible text or branding related to the item, include it in the description. If it is a product, provide its year of release/making. Target audience and age group. Whether the Item is boxed or unboxed, In the box or outside the box. If it's a collection of items also provide description of all the relevant items in the image.Add in details about to whom this item is a great option for.Add in information about installation of the item, if it should be done by the user or professional help is requiredIf there are issues with the item's functioning or reliability also mention that.and If any relevant information required for the attributes that isn't provided in this prompt about the item you can infer that from the image or retrieve from your knowledge base and add to the description",
 <PIL.PngImagePlugin.PngImageFile image mode=RGB size=480x340>,
 "Generate an exhaustive description of the item or items or a collection based on the image and the following information:\nTitle: Candela 2-Piece Sectional - Contemporary Comfort for Your Living Space\nDescription: We offer No Credit Needed Programs, Layaway, Fast Delivery or even Same Day Delivery for a fee! We also accept Cash, Credit Card, Venmo, Cash App, Zelle in stores! Was $2700. and now $853  Call us at (555) 836-7092  Buy Online on our nadia.tanaka135@example.com or Unlike other stores, we have stores on DMV! Visit Our Physical Locations!   Here are all of our addresses:  JMD Furniture     JMD Furniture     JMD Furniture  7702B-D Richmond Hwy.  Alexandria, VA    The Hours are  Monday to Saturday - 10 am to 8 pm  Sunday - 10 am to 6 pm   Explore our diverse furniture and mattress collection! Discover Living Room Sets, Bedrooms, Dining Room Sets, Sectionals, Sofa Sets, Sofa Love Seats, Double Pillow Top Mattresses, Rugs, Carpets, Wall Art, Lamps, Dressers, Chests, Nightstands, Beds, Sleigh Beds, Platform Beds, Headboards, Rails, Box Springs, and a wide selection of other items   *Sofa and Love Seat*         Living Room *Couch and loveseat         Sectional* *Sectional*\nPrice: 853.0 USD\nPublication Date: 2023-11-14T16:51:24\nNumber of Views: 21\nCategory: Furniture\nAddress: \nCountry: United States\nRegion: Maryland\nCity: Mount Rainier\nFeatures, Brand, Occasion, Room, Usage, Type, Category, Condition, Model, Designer, Price, Length, Width, Height, Weight, Fabric, Texture, Style Details, Sales Rank, Customer Reviews, Manufacturer, Assembly Required, Dimensions, Care Instructions, Return Policy, Shipping Weight, Shipping Dimensions, Origin, Material Details, Manufacturer Warranty, Product Dimensions, Product Weight, Top Selling Product, Best Seller, Special Offers, Deals, Accessories, Availability, Number of Items, Recommended Uses, Included Components, Features List, Technical Details, Safety Information, Ingredients, Nutritional Information, Packaging, Product Details, Target Audience, Gift Wrap Available, Age Range, Shipping or Pickup, Free Shipping Included,Address,Country, Region, City,Number of Views (which should indicate popularity), Publication Date,Additional Information, Other Attributes.\nDescribe the content of your image with as many details as you can, focusing only on the main subject of the image and ignoring all irrelevant parts. Also, describe what the item is used for, its condition, and any visible signs of damage. If there is any visible text or branding related to the item, include it in the description. If it is a product, provide its year of release/making. Target audience and age group. Whether the Item is boxed or unboxed, In the box or outside the box. If it's a collection of items also provide description of all the relevant items in the image.Add in details about to whom this item is a great option for.Add in information about installation of the item, if it should be done by the user or professional help is requiredIf there are issues with the item's functioning or reliability also mention that.and If any relevant information required for the attributes that isn't provided in this prompt about the item you can infer that from the image or retrieve from your knowledge base and add to the description",
 <PIL.PngImagePlugin.PngImageFile image mode=RGB size=480x340>]






model = genai.GenerativeModel('gemini-1.5-flash', safety_settings=safety_settings,generation_config=generation_config)
response = model.generate_content(arr)

print(response.candidates[0].content.parts[0].text)
[
    {
        "structured_response": {
            "Brand": "Ford",
            "Condition": "Great",
            "Model": "F150, 250, 350",
            "Price": "2000",
            "Length": "8 ft",
            "Material Details": "Rhino lined",
            "Other Attributes": "No rust, With tailgate",
            "Product Dimensions": "8 ft"
        },
        "descriptive_response": "The image shows a truck bed, which is being lifted by a bobcat. It appears to be in good condition with no visible damage. The bed is made of metal and has a Rhino lined surface. It is 8 feet long, has a tailgate, and comes with no rust. It is likely used for hauling goods and materials."
    },
    {
        "structured_response": {
            "Brand": "JMD Furniture",
            "Price": "853",
            "Room": "Living Room",
            "Type": "Sectional",
            "Category": "Furniture",
            "Condition": "New",
            "Model": "Candela 2-Piece Sectional",
            "Style Details": "Contemporary",
            "Dimensions": "2-piece",
            "Availability": "In Stock",
            "Number of Items": "2",
            "Recommended Uses": "Living Room",
            "Included Components": "2-piece Sectional",
            "Features List": "No Credit Needed Programs, Layaway, Fast Delivery, Same Day Delivery",
            "Target Audience": "Individuals and families",
            "Age Range": "Adults"
        },
        "descriptive_response": "The image shows a gray, two-piece sectional sofa. It is a contemporary design with clean lines. The sofa is in brand new condition and appears to be made of fabric. It can be used as a comfortable seating option in a living room. It is unboxed and in a showroom setting."
    }
]

Welcome to the forum!

The problem, I believe, is this specification:

You cannot get max_output_tokens past the system limit (which you get from list_models()). In the case of the 1.5 Gemini (both pro and flash) that value is 8192. The one million context window applies to input tokens.

That effectively forces you to partition your input to chunks of probably 2, and will require that you use enough generate_content() requests to get through the dataset.

2 Likes