OpenAI compatibility + multimodal?

I wanna adapt my current REST requests with multimodal gemini models (e.g gemini-2.0-flash-exp-image-generation) with the OpenAI compatibility endpoint, but then of course I’m not able to pass the “responseModalities” parameter anymore (not supported in the OpenAI library) in order to make the responses multimodal (in my case, both text or/and image).

Is there any workaround I am missing, or any plans for the future for integrating genuine Gemini API parameters with the OpenAI compatibility REST library?

Thanks

Hi @Javier_De_Pedro_Lope ,

Welcome to the forum !

Gemini API provides OpenAI REST library compatibility.Please refer to the OpenAI compatibility  |  Gemini API  |  Google AI for Developers.

Thank you !!

Hi @Javier_De_Pedro_Lope,

You are right.. The OpenAI compatibility endpoint currently does not support the responseModalities parameter.

Here are some work arounds:
You can make separate API calls—one for text responses via the OpenAI-compatible endpoint and another to the Gemini API for image generation. This approach allows handling multimodal outputs separately. Alternatively, you can create a custom integration that mimics the responseModalities functionality by processing and combining text and image outputs from distinct API calls.

Please monitor updates from the Gemini API team, as they may include multimodal support in future versions of the OpenAI compatibility endpoint. I will escalate this as a feature request to the concerned team as well.

Please keep an eye on Release doc and Open AI doc as mentioned by @Mrinal_Ghosh for future updates.