How to process uploaded image into a multimodal image content without using PIL on python?

My Python RESTAPI bootstrap template sends a POST request using fetch to blueprint endpoint as multipart/form-data. How to process that files["image"] into the contents field of a multimodal request to Gemini API without using PIL Image? Text generation  |  Gemini API  |  Google AI for Developers doesn’t show how to do that.

This is the content of the post files["image"] received at my RESTAPI endpoint:

image: <FileStorage: '1.jpg' ('image/jpeg')>, <class 'quart.datastructures.FileStorage'>

Currently hit this error by using the image directly:

Exception: file uri and mime_type are required

Current workflow according to the docs which doesn’t make sense to me:

(1) Save the data

(2) The API SDK read it from the persistence store again (URI) to send it over to the model to generate text / media

(3) and then clean up the unused file data on the storage.

1 Like

You can upload a file to the api and then pass that file object in the contents of the api call.

You must make sure the file state is ACTIVE before using it.

Somewhere in the documentation there is reference that is better to pass the file first before the text in multimodal input.

I don’t have a multiple requests / prompts with a single media use case. Even if I do, that doesn’t answer my question of how to send the file data POSTed from the UI using fetch inflight directly to Gemini API, whether it is a separate persistence API or inference API without having to persist the data locally and use the local URI in any of the API call which I later have to clean up the local storage!