When processing video is the metadata used?

jerrykur · March 21, 2024, 8:15pm

I was recently trying out Gemini 1.5. I wanted to see how the multi-modal support worked. It was very impressive.

However, it was able to tell things about a video clip I took that I did not think were available from the video frames. Such as where I shoot the video, since there were no identifying signs. When I queried the model about how it determined the video was taken in Norway the response was: “There are a few clues in the video that suggest it was taken in Norway. First, the scenery is very mountainous, and Norway is known for its mountains. Second, the lake in the video is frozen, and Norway has a cold climate that would allow for lakes to freeze. Finally, the small town at the end of the video has a very Scandinavian look to it, with wooden houses and a simple design”. This is very impressive, but is it only using the video image data? Or is it also looking at some video metadata that might tag the location?

I plan on inspecting and editing out the metadata in the future but thought I would see if anyone knew.

keertk-google · March 22, 2024, 2:46am

Thanks for the question and for sharing your experience with the model!
I’m not fully sure about the implementation details, but I believe it’s just image data at the moment. Of course, things may change in the future.

Ani_Baddepudi · April 28, 2024, 10:35pm

Thanks for the question! Confirming that the model only uses video (vision) data, and there is no additional metadata that the model uses to identify location.

Topic		Replies	Views
Has Anyone Gained Access to Gemini 1.5 Pro API? (Re: Gemini 1.5 Pro API's Multimodal Features) Gemini API	3	195	April 26, 2024
How does Gemini see images in chat? A little research Google AI Studio gemini-15 , ai-studio , api , models	3	194	October 13, 2024
How to Get Gemini to Reference File Name Google AI Studio gemini-15 , gemini-api	4	453	October 21, 2024
Gemini 2.0 - Video understanding Gemini API models , help_request	3	1563	May 2, 2025
Issues with the Accuracy of Object Coordinates Detected by Gemini 1.5 in Images Gemini API gemini-15	6	318	June 10, 2024

When processing video is the metadata used?

Related topics