How to process uploaded image into a multimodal image content without using PIL on python?

khteh · May 17, 2025, 6:56am

My Python RESTAPI bootstrap template sends a POST request using fetch to blueprint endpoint as multipart/form-data. How to process that files["image"] into the contents field of a multimodal request to Gemini API without using PIL Image? Text generation | Gemini API | Google AI for Developers doesn’t show how to do that.

This is the content of the post files["image"] received at my RESTAPI endpoint:

image: <FileStorage: '1.jpg' ('image/jpeg')>, <class 'quart.datastructures.FileStorage'>

Currently hit this error by using the image directly:

Exception: file uri and mime_type are required

Current workflow according to the docs which doesn’t make sense to me:

~~(1) Save the data~~

(2) The API SDK read it from the persistence store again (URI) to send it over to the model to generate text / media

~~(3) and then clean up the unused file data on the storage.~~

Yann_Gagnon · May 17, 2025, 9:04pm

You can upload a file to the api and then pass that file object in the contents of the api call.

You must make sure the file state is ACTIVE before using it.

Somewhere in the documentation there is reference that is better to pass the file first before the text in multimodal input.

khteh · May 18, 2025, 3:18am

I don’t have a multiple requests / prompts with a single media use case. Even if I do, that doesn’t answer my question of how to send the file data POSTed from the UI using fetch inflight directly to Gemini API, whether it is a separate persistence API or inference API without having to persist the data locally and use the local URI in any of the API call which I later have to clean up the local storage!

khteh · May 21, 2025, 5:34am

github.com/googleapis/python-genai

How to process uploaded image into a multimodal image content without using PIL on python?

opened 10:45AM - 17 May 25 UTC

closed 05:21AM - 21 May 25 UTC

khteh

status:awaiting user response

My Python RESTAPI bootstrap template sends a POST request to blueprint endpoint …as multipart/form-data. How to process that `files["image"]` into the `contents` field of a multimodal request to Gemini API without using PIL Image? https://ai.google.dev/gemini-api/docs/text-generation#multimodal-input doesn't show how to do that. This is the content of the post `files["image"]` received at my RESTAPI endpoint: ``` image: <FileStorage: '1.jpg' ('image/jpeg')>, <class 'quart.datastructures.FileStorage'> ``` Currently hit this error by using the image directly: ``` Exception: file uri and mime_type are required ``` Current workflow according to the docs which doesn't make sense to me: ~(1) Save the data~ (2) The API SDK read it from the persistence store again (`URI`) to send it over to the model to generate text / media ~(3) and then clean up the unused file data on the storage.~

Topic		Replies	Views
500 error when including a file Gemini API api , model	12	252	September 17, 2024
Sending Files With Prompt: Gemini AI API Gemini API api	11	1642	July 17, 2024
Unable to upload files to Gemini 2.0 : File not exists in Gemini API Gemini API gemini-20	6	437	May 11, 2025
How to tie images to the text parts of a long context? Gemini API gemini-15 , api	5	117	May 27, 2024
I found an error in Google AI Studio documentation for multimodal Gemini 1.5 models with images or video using curl Google AI Studio gemini-15 , api , models	1	192	July 22, 2025

How to process uploaded image into a multimodal image content without using PIL on python?

Related topics