Would the Gemini API through OpenAI SDK support file URI such as images, audio, video, and nontextual PDFs?

I’ve noticed I can just use the OpenAI SDK to use Gemini 1.5 models and it’s an easy to use but what I’ve seen in the documentation is the multimodal support is limited, will the OAI sdk version of Gemini API supports these modalities such as files from public URLs or base64 input maybe in the future if not implemented today?

4 Likes

I also really need this image_url function, but Gemini OpenAI compatible API now only supports passing the base64 image data, which is really not cool!

Hi,

Not at the moment, as explored and answered here: Combining OpenAI-Compatible Gemini Completions with File Uploads - #4 by Will_Powell

The type is not accepted by the Gemini API yet.

Cheers.

Hi @Jie_Zhou

HOW do you pass in the base64 encoded file into the Chat completions?
I’m genuinely interested to know more about that. Both - base64 and URI - would require the content type file which the OpenAI comp endpoints of the Gemini API do not accept.

What am I missing?
What’s your magic sorcery?

Cheers