I want to send a PDF file to Gemini Flash using the API. To get the code, I uploaded a pdf file to the AI Studio chat, and then copied the code generated to create a python script to upload a PDF to flash. But, the script returns an error:
google.api_core.exceptions.InvalidArgument: 400 Unsupported MIME type: application/pdf
The Google documentation specifically says that Gemini Pro 1.5 and Gemini Flash 1.5 APIs support PDF uploads: Learn about the Gemini models | Vertex AI for Firebase.
I did search the developer forum and found a solution, which is the process I followed. But it does not work. “How to use the Gemini API with a PDF”. How to use the Gemini API with a PDF?
I don’t want to send extracted text, I need to send the PDF itself because I want the model to look at it and respond to prompts about specific formatting.
1 Like
Just an FYI:
It took me almost two weeks to finally get PDF upload working. Apparently, you can NOT upload PDF files to Gemini through the Google AI Studio API. Only through the Vertex AI API.
First you’ve got to set up your environment to use the Vertext AI API.
- So I needed to install:
- gcloud cli (Google Cloud SDK)
- During this process, I also set my credentials using the user authentication process:
- google-cloud-aiplatform (Python SDK)
- This installs the Vertex AI API
Here is the base code I worked with: Process a PDF file with Gemini 1.5 Pro | Generative AI on Vertex AI | Google Cloud
Also note that you will need to create your own Google Cloud Storage buckets for file upload: https://console.cloud.google.com
Then from Navigation Menu, select “Cloud Storage” → “Buckets”. So when you upload, these are your variables:
`print(f"File {local_file_path} uploaded to gs://{bucket_name}/{destination_blob_name}")`
And that’s it. Just providing this in case it helps someone else in the same boat down the line. What an adventure!
This might not be the best solution.
Both pricing and quotas are very different between VertexAPI and Gemini API, being this last one a lot cheaper
To deal with the PDF issue, simple just extract the text from the pdf and pass it as context. If there is no text, just convert the pdf to Ong and pass the images as context. It is basically what Gemini does behind the scene, I am told
I just realized that I failed to mention my specific use case for this.
Which is precisely what I’ve been doing for the past year and a half. The problem is that I am now dealing with documents which contain strikethrough characters. For example: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09.pdf
Most extractors either leave this text in, or return it as garbled text.
To date, I’ve not been able to find one PDF to text extractor which will reliably exclude the strikethrough characters from the extracted text. Marker markdown came closest, but would exclude text that should be included if it was sandwiched between strikethrough text.
So, for the moment, this is the only reliable automated solution I’ve been able to come up with.
I’ve never had to face this situation, sorry.
If this can help, I’ve used Ghostscript to convert pdf to png and psycopg2 (python) to export text. I haven’t tried with strikeout documents
PD: Are you sure you can share this document publicly? It seems to contain pretty sensitive information…
It is public information available since the end of 2022.
And there’s the rub. My Google search for psycopg2 and strikethrough or strikeout text yields nothing.
I currently have set up, as pdf to text extractors: AWS Textract, PyMuPdf, PdfToText, Solr (tika) and Marker (markdown). I even tried LlamaParse. There just doesn’t appear to be any way to deal with strikethrough text in PDFs, other than using the multi-modal feature of an LLM to “visualy” inspect them.
Hey! We are actively working on PDF file support in the API right now, it should land soon!
1 Like
The Vertex AI API currently supports PDF file upload. It’s what I am using now. The Google AI Studio API, however, doesn’t. It seems to support every file format EXCEPT PDF.
Yeah, Vertex supports PDFs, but we don’t yet. Hang tight!
1 Like
Thank you! If it’s not too much to ask, could you pass along to the developers a note to please look at how Gemini 1.5 Flash processes PDFs compared to Gemini 1.5 Pro? Pro seems to utilize it’s multi-modal abilities to “visualize” the PDF while Flash appears to just extract the text. Perhaps this is the architectural design, but if not, it would be helpful for Flash to be able to “see” tables and strikethrough and other characters in the PDF the way Pro can do currently.
I have a PDF with strikeout text which I’ve been using to test: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09.pdf
This is the prompt (along with the PDF file upload):
prompt = """
You are a very professional PDF to text document extractor.
Please extract the text from this PDF.
Ensure that all strikethrough text is excluded from the output.
Try to format any tables found in the PDF.
Do not include page numbers, page headers, or page footers.
**Exclude Strikethrough:** Do not include any strikethrough text in the output.
**Include Tables:** Tables should be preserved in the extracted text.
**Exclude Page Headers, Page Footers, and Page Numbers:** Eliminate these elements which are typically not part of the main content.
"""
Here are examples which represent the consistent output from Pro and Flash to the exact same prompt and PDF:
Flash: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09_gemini_flash02.txt
Pro: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09_gemini_pro01.txt
Pro is consistently good and accurate. Flash is consistently poor.
If the Devs could fix this in the AI Studio, that would be awesome!
1 Like
To upload a PDF to the Gemini File API, you will need to follow the API’s documentation and guidelines. Below is a general approach for uploading a file using a typical file upload API. If you have specific API details for Gemini, be sure to follow their documentation. Here’s a general example
:
Steps to Upload a PDF to an API
- Obtain API Credentials
- Make sure you have the necessary API credentials, such as an API key or token, which may be required for authentication.
- Prepare Your PDF File
- Ensure your PDF file is ready for upload and stored in a location accessible by your application or script.
1 Like