Can Gemini Analyze from Voices and Videos?

Can Gemini Analyze from Voices and Videos?

Welcome to the forum.

Yes, the 1.5 models are multimodal. To help you get started, there are examples to try out here: Prompting with media files  |  Gemini API  |  Google for Developers (other languages besides Python are also available).

Yes! thank you.
I have another question: if the file is PDF, PPTX, DOCX or other type of files, how can upload them.

The function of (upload_files) does not support docx,pptx, pdf and other types, it is for just Videos,Images,Audios