Can Gemini Analyze from Voices and Videos?
Welcome to the forum.
Yes, the 1.5 models are multimodal. To help you get started, there are examples to try out here: Prompting with media files | Gemini API | Google for Developers (other languages besides Python are also available).
Yes! thank you.
I have another question: if the file is PDF, PPTX, DOCX or other type of files, how can upload them.
The function of (upload_files) does not support docx,pptx, pdf and other types, it is for just Videos,Images,Audios
I doubt support would be coming soon - However it would be really simple to add as Microsoft Office documents are just ZIP files - Just needs to allow ZIPs then office docs should be running relatively fine (as long as you teach it all of the syntax)
I think so that the AI of Google is first read these files using Python code then after reading them it feed it to AI then the AI can answer anything as it’s a TEXT, Image, Video or Voice!
But until now the AI can not analyze ZIP files!
But me I’ve built a program with this API to Analyze ANYTHING literally.