Open AI comp - Image generation and default values

,

Hi @GUNAND_MAYANGLAMBAM and @Vishal

Found another curiousity in the OpenAI compatibility. This time, it’s related to the image generations endpoint. According to the Open AI API reference there are optional parameters and such with default values.

Request

Default values

Here’s the basic REST call with required values only.

POST https://generativelanguage.googleapis.com/v1beta/openai/images/generations
Authorization: Bearer AIza...
Content-Type: application/json; charset=utf-8

{
  "prompt": "Photorealistic shot in the style of DSLR camera of the northern lights dancing across the Arctic sky, stars twinkling, snow-covered landscape",
  "model": "imagen-3.0-generate-002"
}

Which gives me an HTTP 400 Bad Request with Unsupported response format: \u003cempty\u003e. Supported formats are: b64_json.

Response Format - missing and case sensitive

However, OpenAI says * Must be one of url or b64_json. Defaults to url* Which is an actual problem as the Gemini API does not even provide url as response option.

Setting the ResponseFormat using upper case gives again HTTP 400 with Unsupported response format: B64_JSON. Supported formats are: b64_json. :face_holding_back_tears:

OK, perhaps I’m asking for too much despite the Gemini API commonly quite tolerant, ie camelCase vs snake_case, etc. Anyways, that was easy to change adding a lower case conversion to the code.

Quality

Interestingly, this parameter is not case-sensitive.
Given the high similarity of the generated pairs of images I suspect that this property is not applied in the Gemini API (yet).

Size

Also, this parameter is not case-sensitive.
But the discrete values are not applied. The returned size seems to be fixed to 1024x1024 only and not offering the options of OpenAI like Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.

Style

And this one… is not case-sensitive.
Given the high similarity of the generated pairs of images I suspect that this property is not applied in the Gemini API (yet).

User

Accepted nicely. :+1:

Response

Created

It seems that the current response structure does not provide the created property. See OpenAI API Reference returning a data resource. Which seems to be an Image resource, see next.

RevisedPrompt property?

Is the Gemini API going to provide all properties of the Image resource? https://platform.openai.com/docs/api-reference/images/object

What’s expected?

Well, for a start it would be pleasant that ResponseFormat should be case-insensitive, and the missing url should be provided. Not sure though how those generated images should be made available. Maybe File API or similarly to GeneratedFiles in Vertex AI?

Supporting the Size values of OpenAI or at least those provided by Imagen 3.

Complete REST call

For testing and debugging purpose, here’s the complete call.

POST https://generativelanguage.googleapis.com/v1beta/openai/images/generations
Authorization: Bearer AIza...
Content-Type: application/json; charset=utf-8

{
  "prompt": "Photorealistic shot in the style of DSLR camera of the northern lights dancing across the Arctic sky, stars twinkling, snow-covered landscape",
  "model": "imagen-3.0-generate-002",
  "n": 1,
  "quality": "STANDARD",
  "responseFormat": "b64_json",
  "size": "1024x1024",
  "style": "NATURAL"
  "user": "Mscc.GenerativeAI"
}

I created a test suite with all 20 permutations (2x2x5) and they are all successful. Receiving HTTP 200 OK.

Cheers

2 Likes

Quick update: ResponseFormat url won’t be supported according to @GUNAND_MAYANGLAMBAM as mentioned here: OpenAI comp: Image generation

Hmm, kind of expected as this requires the ability to download the generated file. Which should have been possible using Vertex AI though and a designated GCS bucket for the Gemini. Or even Google AI using the user’s Drive space, as it already uses it…

Hello @GUNAND_MAYANGLAMBAM

How about the idea that OpenAI’s Size attribute is (internally) mapped to Gemini’s AspectRatio. I mean the ratios of the size dimensions matches neatly those values.

Just a thought.

Cheers.

Hey @jkirstaetter , You are right, currently, the default size is 1024x1024. Even if we change the aspect ratio, it’s not reflecting.
Let me get back to you on this.

Thanks!

1 Like