Has Anyone Gained Access to Gemini 1.5 Pro API? (Re: Gemini 1.5 Pro API's Multimodal Features)

sungmin.han · March 22, 2024, 8:21am

In light of the information provided in the Technical Report for Gemini 1.5 Pro, it appears the model is designed to be multi-modal. This leads me to understand that it has capabilities for processing both video and audio inputs. (at minimum >= 10mins length of video without extracting text from video etc.)

Does this iteration continue the practice of employing distinct modules for vision and audio, such as Gemini Vision and Gemini Audio, respectively? I’m eager to find out whether the Gemini API consolidates these functionalities, allowing for a versatile approach in handling text, audio, and video inputs.

shuvro · March 22, 2024, 7:42pm

I am using 1.5 Pro API from yesterday, it seems to me your prediction is right. There won’t be any distinct modules, as it is multi-modal now it will serve as a combined approach to handle text, audio, and video inputs.

Bukempas · March 22, 2024, 10:01pm

yes, it seems that in Gemini 1.5 it’s combined gemini pro and pro vision

Josh_Gordon_Google · April 26, 2024, 4:09am

Late to the party, but you can go ahead and use Gemini 1.5 Pro in Google AI Studio and in the Gemini API.

Yes, it’s a multimodal model, which means it’s a single model that handles text, code, images, audio, and video.

You can find examples here:

cookbook/quickstarts/Audio.ipynb at main · google-gemini/cookbook · GitHub
cookbook/quickstarts/Video.ipynb at main · google-gemini/cookbook · GitHub (right now, via extracting frames; we’re working on making this easier for you)

Topic		Replies	Views
Gemini 1.5 Pro, a new experimental model from Google DeepMind Announcements gemini-15 , ai-studio , launches	7	858	July 10, 2024
Issues Integrating Gemini API for AI-Powered Video Editing on CapCut Website Gemini API api	1	63	February 10, 2025
No Multimodal (file upload) Options Google AI Studio	3	124	March 18, 2025
Can Gemini Analyze from Voices and Videos? Gemini API	4	243	October 31, 2024
Live with video and audio input API and docs Gemini API api , docs	1	193	December 13, 2024

Has Anyone Gained Access to Gemini 1.5 Pro API? (Re: Gemini 1.5 Pro API's Multimodal Features)

Related topics