API periodically ignoring multiple documents

Hi there, I’m struggling to have gemini summarise multiple documents in one payload.
When I submit the following:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/6yp1nlyt523d",
            "mimeType": "application/pdf"
          }
        },
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/r9tx3opn6dz",
            "mimeType": "application/pdf"
          }
        }
      ]
    },
    {
      "role": "user",
      "parts": [
        {
          "text": "what can you tell me about all of these files?"
        }
      ]
    }
  ],
  "systemInstruction": {
    "role": "user",
    "parts": [
      {
        "text": "always end a response with 'END MESSAGE'"
      }
    ]
  },
  "generationConfig": {
    "temperature": 1,
    "topK": 40,
    "topP": 0.95,
    "maxOutputTokens": 8192,
    "responseMimeType": "text/plain"
  }
}

Most (70%?) of the time, Gemini will only summarise one of the files supplied, and state that it only has one. The other 30% will work as intended.

An equivalent prompt, using pngs works fine every time.

Could someone point me in the right direction here? Given it works some of the time, it might be a problem with my prompt, or maybe I’m misforming the json payload? (I am dynamically building it in my .NET app). I’m pulling out my hair here.

// Edit: the following prompt that submits a pdf and a html file, only the pdf file is ever recognised. This occurs 100% of the times.

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/wcn44yqwsqqf",
            "mimeType": "application/pdf"
          }
        },
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/3w0j07r636cn",
            "mimeType": "text/html"
          }
        }
      ]
    },
    {
      "role": "user",
      "parts": [
        {
          "text": "summarise each of these files and provide a count of all files I have sent you"
        }
      ]
    }
  ],
  "systemInstruction": {
    "role": "user",
    "parts": [
      {
        "text": "always end a response with 'END MESSAGE'"
      }
    ]
  },
  "generationConfig": {
    "temperature": 0.4,
    "topK": 40,
    "topP": 0.95,
    "maxOutputTokens": 8192,
    "responseMimeType": "text/plain"
  }
}

Thanks,
Phil.

Hi @Phil_Sullivan , please try the below cURL request. I am able to summarize multiple documents in one request.

%%bash
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=$GOOGLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "contents": [
    {
      "parts": [
        {
          "text": "can you summarize about all of these files?"
        },
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/if6r6aaxozo6",
            "mimeType": "text/html"
          }
        },
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/fw7t0ybhltgr",
            "mimeType": "application/pdf"
          }
        }
      ],
      "role": "user"
    }
  ]
}'

Hope it helps…

Hey @GUNAND_MAYANGLAMBAM thanks for helping out with this!

I’ve tried matching your payload structure, and gone to the latest flash model as you’re using, and I’m still having the same result. It’s only summarising one of the files - the one it summarises seems to change occasionally.

https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key={key}

{
  "contents": [
    {
      "parts": [
        {
          "text": "can you summarize about all of these files?"
        },
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/7oppvqk3idmi",
            "mimeType": "text/html"
          }
        },
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/cqskgarg7vda",
            "mimeType": "application/pdf"
          }
        }
      ],
      "role": "user"
    }
  ]
}

I’ve tried this with and without the system prompt and generation config and have the same result.

Here is the colab gist for reference.

If possible, can you share a sample PDF / html file ???

@GUNAND_MAYANGLAMBAM

Here is a link to the files I’ve been trying to summarise.
https://drive.google.com/drive/folders/1XmbCKF1yZzWedWYwdxI6G9a107EsYcI-?usp=sharing

On a hunch, I think there might be something with the html data that’s causing the problem? I tried your update with two pdfs and it worked as it should. The pdf is fairly long - I wonder if it’s tied to that?

I really appreciate your help with this. Thanks so much.

Hey @Phil_Sullivan , It’s great to know that it worked with multiple pdfs.

Your HTML file is not the issue. I tried summarizing your HTML file along with some PDF files and, it worked with slight adjustment to the prompt.

@GUNAND_MAYANGLAMBAM would you mind sharing that prompt with me? I’ve tried a bunch of different prompts on my end, and have yet to find one that will guarantee to parse both/all documents every time.

Try below prompt :

Provide summary for each of the given below files

  • Adjust Payload Structure:
  • Try separating the fileData parts into distinct elements within the “contents” array. This may help Gemini AI identify each file as an individual input:

{
“contents”: [
{
“role”: “user”,
“parts”: [
{
“fileData”: {
“fileUri”: “https://generativelanguage.googleapis.com/v1beta/files/6yp1nlyt523d”,
“mimeType”: “application/pdf”
}
}
]
},
{
“role”: “user”,
“parts”: [
{
“fileData”: {
“fileUri”: “https://generativelanguage.googleapis.com/v1beta/files/r9tx3opn6dz”,
“mimeType”: “application/pdf”
}
}
]
},
{
“role”: “user”,
“parts”: [
{
“text”: “What can you tell me about all of these files?”
}
]
}
],
“systemInstruction”: {
“role”: “user”,
“parts”: [
{
“text”: “Always end a response with ‘END MESSAGE’”
}
]
},
“generationConfig”: {
“temperature”: 1,
“topK”: 40,
“topP”: 0.95,
“maxOutputTokens”: 8192,
“responseMimeType”: “text/plain”
}
}

  • Add Explicit Instructions: Since Gemini AI seems to struggle with multiple files, you could add an explicit instruction in the prompt to clarify that there are multiple PDFs:

{
“role”: “user”,
“parts”: [
{
“text”: “These are multiple PDF files. Please provide a summary for each of them.”
}
]
}`

  • Check JSON Formation: When dynamically building the JSON in your .NET app, ensure the JSON is correctly formatted and validated before sending it to Gemini AI. Malformed JSON could cause inconsistent processing behavior.
  • Increase Retry Logic: Since the problem occurs intermittently, consider implementing a retry mechanism within your .NET app to resend the request if it fails to process both files.

@GUNAND_MAYANGLAMBAM I tried this prompt with my html file, and it did not work unfortunately. It ONLY seems to summarise the html file, and ignores the pdf.

@SURESH_KUMAR I’ve investigated these leads, and I I’ve improved my payload as a result.

{
  "contents": [
    {
      "parts": [
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/r4kvbz7euziu",
            "mimeType": "text/html"
          }
        },
        {
          "text": "Please carefully review the content of this file: simplehtml.html."
        },
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/iv552u3ea7g9",
            "mimeType": "image/png"
          }
        },
        {
          "text": "Please carefully review the content of this file: ComfyUI_00059_.png."
        }
      ],
      "role": "user"
    },
    {
      "parts": [
        {
          "text": "can you give me a summary of the data you have been provided?"
        }
      ],
      "role": "user"
    }
  ],
  "systemInstruction": {
    "role": "user",
    "parts": [
      {
        "text": "always end a response with \u0027END MESSAGE\u0027"
      }
    ]
  },
  "generationConfig": {
    "temperature": 1,
    "topK": 40,
    "topP": 0.95,
    "maxOutputTokens": 8192,
    "responseMimeType": "text/plain"
  }
}

With this structure, I’ve managed to confirm the following:

  • Two pdfs are correctly summarised
  • Two images are correctly summarised.
  • An image and a pdf are correctly summarised.

However, I’m convinced now there’s a problem with the non-pdf documents.

  • Sending a simplified html file and an image, it only summarises the html file
  • Changing the mime type of a html file to “text/plain” has the same problem - only the html file is summarised.

I feel like my only way forward here is to dynamically convert all documents to pdfs on my backend?

Before I implement that, would it be possible for you to run my payload in this post, and confirm I’m not going mad? (I haven’t deleted the files, so the file links should be valid for 48 hours)

//Edit: just tried reordering the parts inside the “file content”, and in the example in this post, it’s ONLY the first one listed that is summarised.