Upload PDF to Gemini File API

Ron_Parker · June 24, 2024, 8:17pm

I want to send a PDF file to Gemini Flash using the API. To get the code, I uploaded a pdf file to the AI Studio chat, and then copied the code generated to create a python script to upload a PDF to flash. But, the script returns an error:

google.api_core.exceptions.InvalidArgument: 400 Unsupported MIME type: application/pdf

The Google documentation specifically says that Gemini Pro 1.5 and Gemini Flash 1.5 APIs support PDF uploads: Learn about the Gemini models | Vertex AI in Firebase.

I did search the developer forum and found a solution, which is the process I followed. But it does not work. “How to use the Gemini API with a PDF”. How to use the Gemini API with a PDF?

I don’t want to send extracted text, I need to send the PDF itself because I want the model to look at it and respond to prompts about specific formatting.

Ron_Parker · July 1, 2024, 8:51am

Just an FYI:

It took me almost two weeks to finally get PDF upload working. Apparently, you can NOT upload PDF files to Gemini through the Google AI Studio API. Only through the Vertex AI API.

First you’ve got to set up your environment to use the Vertext AI API.

So I needed to install:
- gcloud cli (Google Cloud SDK)
  - During this process, I also set my credentials using the user authentication process:
    - Authenticate for using the gcloud CLI | Authentication | Google Cloud
- google-cloud-aiplatform (Python SDK)
  - This installs the Vertex AI API

Here is the base code I worked with: Process a PDF file with Gemini 1.5 Pro | Generative AI on Vertex AI | Google Cloud

Also note that you will need to create your own Google Cloud Storage buckets for file upload: https://console.cloud.google.com

Then from Navigation Menu, select “Cloud Storage” → “Buckets”. So when you upload, these are your variables:

    `print(f"File {local_file_path} uploaded to gs://{bucket_name}/{destination_blob_name}")`

And that’s it. Just providing this in case it helps someone else in the same boat down the line. What an adventure!

patrick_mullot1 · July 2, 2024, 6:45am

This might not be the best solution.
Both pricing and quotas are very different between VertexAPI and Gemini API, being this last one a lot cheaper

To deal with the PDF issue, simple just extract the text from the pdf and pass it as context. If there is no text, just convert the pdf to Ong and pass the images as context. It is basically what Gemini does behind the scene, I am told

Ron_Parker · July 2, 2024, 8:41am

I just realized that I failed to mention my specific use case for this.

Which is precisely what I’ve been doing for the past year and a half. The problem is that I am now dealing with documents which contain strikethrough characters. For example: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09.pdf

Most extractors either leave this text in, or return it as garbled text.

To date, I’ve not been able to find one PDF to text extractor which will reliably exclude the strikethrough characters from the extracted text. Marker markdown came closest, but would exclude text that should be included if it was sandwiched between strikethrough text.

So, for the moment, this is the only reliable automated solution I’ve been able to come up with.

patrick_mullot1 · July 2, 2024, 9:47am

I’ve never had to face this situation, sorry.
If this can help, I’ve used Ghostscript to convert pdf to png and psycopg2 (python) to export text. I haven’t tried with strikeout documents

PD: Are you sure you can share this document publicly? It seems to contain pretty sensitive information…

Ron_Parker · July 2, 2024, 10:58am

It is public information available since the end of 2022.

And there’s the rub. My Google search for psycopg2 and strikethrough or strikeout text yields nothing.

I currently have set up, as pdf to text extractors: AWS Textract, PyMuPdf, PdfToText, Solr (tika) and Marker (markdown). I even tried LlamaParse. There just doesn’t appear to be any way to deal with strikethrough text in PDFs, other than using the multi-modal feature of an LLM to “visualy” inspect them.

Logan_Kilpatrick · July 3, 2024, 12:10am

Hey! We are actively working on PDF file support in the API right now, it should land soon!

Ron_Parker · July 3, 2024, 12:25am

The Vertex AI API currently supports PDF file upload. It’s what I am using now. The Google AI Studio API, however, doesn’t. It seems to support every file format EXCEPT PDF.

Logan_Kilpatrick · July 3, 2024, 2:10am

Yeah, Vertex supports PDFs, but we don’t yet. Hang tight!

Ron_Parker · July 3, 2024, 4:23am

Thank you! If it’s not too much to ask, could you pass along to the developers a note to please look at how Gemini 1.5 Flash processes PDFs compared to Gemini 1.5 Pro? Pro seems to utilize it’s multi-modal abilities to “visualize” the PDF while Flash appears to just extract the text. Perhaps this is the architectural design, but if not, it would be helpful for Flash to be able to “see” tables and strikethrough and other characters in the PDF the way Pro can do currently.

I have a PDF with strikeout text which I’ve been using to test: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09.pdf

This is the prompt (along with the PDF file upload):

    prompt = """
    You are a very professional PDF to text document extractor.
    Please extract the text from this PDF. 
    Ensure that all strikethrough text is excluded from the output. 
    Try to format any tables found in the PDF. 
    Do not include page numbers, page headers, or page footers.

    **Exclude Strikethrough:** Do not include any strikethrough text in the output.
    **Include Tables:** Tables should be preserved in the extracted text.
    **Exclude Page Headers, Page Footers, and Page Numbers:** Eliminate these elements which are typically not part of the main content.
    """

Here are examples which represent the consistent output from Pro and Flash to the exact same prompt and PDF:

Flash: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09_gemini_flash02.txt

Pro: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09_gemini_pro01.txt

Pro is consistently good and accurate. Flash is consistently poor.

If the Devs could fix this in the AI Studio, that would be awesome!

Daniel_Jarquin · July 25, 2024, 8:30pm

To upload a PDF to the Gemini File API, you will need to follow the API’s documentation and guidelines. Below is a general approach for uploading a file using a typical file upload API. If you have specific API details for Gemini, be sure to follow their documentation. Here’s a general example
:

Steps to Upload a PDF to an API

Obtain API Credentials

Make sure you have the necessary API credentials, such as an API key or token, which may be required for authentication.

Prepare Your PDF File

Ensure your PDF file is ready for upload and stored in a location accessible by your application or script.

Stephan_Noller · February 6, 2025, 11:22am

Hey Logan, has this landed in the meantime? Where can i find it?

Topic		Replies	Views
PDF Document Processing returns Bad Request Gemini API gemini-15 , api , models	5	481	September 24, 2024
How to upload pdfs using File API to gemini api or google AI file manager Gemini API api	10	484	January 31, 2025
How come gemini studio can use unsupported mimetype? Gemini API gemini-15 , ai-studio , api	5	159	June 24, 2024
"Request contains an invalid argument" when use uploaded PDF Gemini API gemini-api , gemini-20	8	631	February 14, 2025
Google-genai files.upload not working Gemini API api	14	1584	April 19, 2025

Upload PDF to Gemini File API

Steps to Upload a PDF to an API

Related topics