Hey there, Currently am facing an issue where I want to take a locally stored file from user, by File API and obtain the File object and directly upload it to google AI file manager, where it can be store for further content generation, but I can’t get this working in Nodejs, as its methods or api don’t accept any file object.
One solution for now is to encode in base64 and then send that string to generate content with user prompt, but this is not recommended for large files above 20MB or so.
What’s the right way to do this? Efficiently and with recommended approach.
[IMPORTANT : I want to upload and use the scanned pdfs particularly, does gemini api not support scanned pdfs? ]
Help would be really appreciated
I tried using upload via rest API (using https://generativelanguage.googleapis.com/v1beta/files/
)
it does seem to upload successfully but when I try to provide the uploaded file uri and ask questions or prompts on it, It throws error saying there are no pages in document
Again… Is this because the api can’t recognize scanned pdf or something else
?
Hi,
You need to check the state
of the uploaded file and see whether it is Active
or not. Only then you can use it as a file resource as part of your generative requests.
See here Using files | Google AI for Developers and here Explore vision capabilities with the Gemini API | Google AI for Developers for more details.
Cheers
Hey thanks for the reply, but it still doesn’t work
Right now am trying to upload files with this api : https://generativelanguage.googleapis.com/upload/v1beta/files
It successfully does get uploaded but when I use rest api for content generation, with this body for e.g :
{
"contents": [
{
"parts": [
{
"text": "What is this doc about?"
},
{
"file_data": {
"mime_type": "application/pdf",
"file_uri": "https://generativelanguage.googleapis.com/v1beta/files/olixgp5qm1zz"
}
}
]
}
]
}
I get a error response :
{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"status": "INVALID_ARGUMENT"
}
}
Why would this be? the file is successfully uploaded with status active, but generate content api throws invalid argument.
Hi,
here’s the REST payload I produce in my test cases for PDF handling.
{
"model" : "models/gemini-1.5-pro-latest",
"contents" : [ {
"role" : "user",
"parts" : [ {
"text" : "Your are a very professional document summarization specialist. Please summarize the given document."
}, {
"fileData" : {
"fileUri" : "https://generativelanguage.googleapis.com/v1beta/files/tflhg5hf78m0",
"mimeType" : "application/pdf"
}
} ]
} ]
}
This gives me an HTTP 200 OK with the information about the PDF document as requested.
The main aspect might be the missing role
key in your request.
BTW, which programming language are you using? It seems that you’re not using an SDK to solve your tasks…
Cheers
Hey sorry for late reply, and thanks for you response, but above snippet or code also doesn’t work for me, I still get the 400 error. Can you please show me how you first upload pdf documents using REST api? Maybe am doing something wrong there
Also, am using javascript, but the sdk doesn’t quite work for me when using streaming content generation in my app.
Thanks!
Hey @jkirstaetter waiting for your response, I can’t get this working, don’t know what’s going wrong, maybe your rest api request for uploading pdfs can help me.
It would be really appreciated.
Hey @Bhargav, this colab gist might help.
Hi,
To upload to the File API I’m using a multipart
request which contains the JSON and the (stream of) binary information of the file. Here’s what my code in .NET looks like.
using var fs = new FileStream(uri, FileMode.Open);
var multipartContent = new MultipartContent("related");
multipartContent.Add(new StringContent(json, Encoding.UTF8, Constants.MediaType));
multipartContent.Add(new StreamContent(fs, (int)Constants.ChunkSize)
{
Headers = {
ContentType = new MediaTypeHeaderValue(mimeType),
ContentLength = totalBytes
}
});
This is send to the following endpoint: https://generativelanguage.googleapis.com/upload/v1beta/files?alt=json&uploadType=multipart
The JSON part is quite simple containing the display name only.
{
"file" : {
"displayName" : "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context"
}
}
However, that’s probably nothing you might be able to use though except in .NET.
The Gemini Cookbook repository has samples for Python, JavaScript as well as shell scripting using chunked pieces to upload a file to the File API which might be more interesting for you.
Cheers
Hey, thanks for the response, I’ll try and update you with it