Can Gemini Analyze from Voices and Videos?

Mood_Nice · June 25, 2024, 11:29pm

OrangiaNebula · June 26, 2024, 12:33am

Welcome to the forum.

Yes, the 1.5 models are multimodal. To help you get started, there are examples to try out here: Prompting with media files | Gemini API | Google for Developers (other languages besides Python are also available).

Mood_Nice · June 28, 2024, 11:29pm

Yes! thank you.
I have another question: if the file is PDF, PPTX, DOCX or other type of files, how can upload them.

The function of (upload_files) does not support docx,pptx, pdf and other types, it is for just Videos,Images,Audios

HatKid · October 23, 2024, 1:00am

I doubt support would be coming soon - However it would be really simple to add as Microsoft Office documents are just ZIP files - Just needs to allow ZIPs then office docs should be running relatively fine (as long as you teach it all of the syntax)

Mood_Nice · October 31, 2024, 8:32am

I think so that the AI of Google is first read these files using Python code then after reading them it feed it to AI then the AI can answer anything as it’s a TEXT, Image, Video or Voice!

But until now the AI can not analyze ZIP files!

But me I’ve built a program with this API to Analyze ANYTHING literally.

Topic		Replies	Views
Analyzing Generic files, not just Videos, Audios and images! Gemini API	1	89	July 1, 2024
Document learning? Gemini API	4	234	May 3, 2024
Can we attach documents in Gemini API? Gemini API api	2	210	April 28, 2024
Has Anyone Gained Access to Gemini 1.5 Pro API? (Re: Gemini 1.5 Pro API's Multimodal Features) Gemini API	3	198	April 26, 2024
1.5 PRO - Api - Upload video mp4 Gemini API gemini-15 , api	3	535	April 29, 2024

Can Gemini Analyze from Voices and Videos?

Related topics