Combining OpenAI-Compatible Gemini Completions with File Uploads

I’m exploring whether two specific features of Gemini can be used together:

:one: Using OpenAI’s Client with Gemini

Gemini supports OpenAI’s client library and OpenAI-style completion format, allowing it to be used similarly to OpenAI’s models. For example:

from openai import OpenAI

client = OpenAI(
    api_key="GEMINI_API_KEY",
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = client.chat.completions.create(
    model="gemini-2.0-flash",
    n=1,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain to me how AI works"}
    ]
)

print(response.choices[0].message)

:two: Uploading Files & Referencing Them in Completions

Gemini also supports file uploads, allowing models to analyze images or documents. This is done using Google’s genai client:

from google import genai

client = genai.Client()
myfile = client.files.upload(file=media / "Cajun_instruments.jpg")
print(f"{myfile=}")

result = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        myfile,
        "\n\n",
        "Can you tell me about the instruments in this photo?",
    ],
)
print(f"{result.text=}")

My Question

Can these two features be used together? Specifically, is it possible to upload a file via genai.Client() and then reference it in a completion request using the OpenAI-compatible client format? If anyone has tested this or found workarounds, I’d love to hear your insights!


Please see the docs I am referring to for Files and OpenAI Format.

Hi @Will_Powell

Welcome to the forum.
Interesting use case and check to see whether multimodality is available in the OpenAI compatibility. Short answer: No.

After checking the OpenAI Platform regarding files in chat completions: https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

I assembled the same REST call using the file resource provided by the File API. The result looks kind of like this:

POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
Authorization: Bearer AIza...
Content-Type: application/json; charset=utf-8

{
  "model": "gemini-2.0-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant in the field of space travelling."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "file",
          "file": {
            "file_id": "files/3c02j9q8oge7"
          }
        },
        {
          "type": "text",
          "text": "Explain to me how the Apollo lunar module works"
        }
      ]
    }
  ]
}

There is no problem to provide an array of content, however it’s the "type": "file" that doesn’t find its way through.

Here’s the response from the Gemini 2.0 Flash model.

{
  "error": {
    "code": 400,
    "message": "Invalid content part type: file",
    "status": "INVALID_ARGUMENT"
  }
}

Cheers.

PS: @GUNAND_MAYANGLAMBAM @Vishal - any roadmap to bring multi-modality to the OpenAI compatibility endpoints?

1 Like

Thank you for your quick response. Yes, my hunch was that this would not be possible, at least for now. I will stick to using genai library for the time being. All the best