I’m planning a pilot, can you may be give me an idea about when multimodal is going to be released? Not an exact date, but rather will it be weeks? months? half a year?
Many thanks!
Hi @IKGN
Not sure what you are asking for but multimodal response is already available. At least in Gemini 2.0 Flash models.
See here: Generate images | Gemini API | Google AI for Developers
Cheers.
the text and image generation output has been released for experimental use. try it out via google ai studio. just choose gemini 2.0 flash experimental
I meant multimodal live API, sorry, voice to voice. Multimodal Live API | Gemini API | Google AI for Developers That’s still experimental and restrictions apply (3 concurrent sessions per API key etc.). I’m trying it out since 3 months already and would like to scale a bit more.