Video understanding with ultrawide aspect ratio crops frames

We’re using the Gemini API (both 2.5 flash and pro) for video understanding and have noticed some issues when sending videos (via Files API or base64 encoded) with ultrawide aspect ratios (e.g. 16:3). The model appears to centre and crop the frames to a 16:9 aspect ratio. This results in the contents on either side of the frame being omitted.

I can’t find any guidance on why this happens or best practices when it comes to aspect ratios and sending video content. Is this the expected behaviour and if so would there be any documentation for this?

If I was to send the single 16:3 frame as an image there is no issues with understanding the full content so it appears to be video related and how the frames are extracted and sent on the API side?

You can replicate the issue in AI Studio also - uploading a 16:3 video and asking it whats on the left or right (assuming there is something of note) it will not be able to answer accurately.

You can see the issue replicated in AI studio below:

Hello,

Welcome to the Forum!

Could you please share your code along with some sample images that you are using? This will help us reproduce the issue on our end and analyze it more effectively.

Hi Lalit,

You can reproduce the issue using the following script (or directly in AI studio);

import { GoogleGenerativeAI } from "@google/generative-ai";
import * as fs from "node:fs";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

const contents = [
  {
    inlineData: {
      mimeType: "video/mp4",
      data: fs.readFileSync("16x3.mp4", { encoding: "base64" }),
    },
  },
  { text: "What letters are visible in this video?" },
];

const response = await model.generateContent(contents);
console.log(response.response.text());

Both gemini-2.5-flash and gemini-2.5-pro produce inaccurate results;

gemini-2.5-flash:

The letter “B” is visible throughout the video.

gemini-2.5-pro:

The only letter visible in the video is B. It is shown in black against a white background for the entire duration of the clip.

You can use the test videos to;
16x3.mp4 - doesnt work
16x9.mp4 - works as expected

Hello,

Thank you for sharing the details. We will thoroughly review and analyze this issue internally and get back to you as soon as possible with our findings.

We appreciate your patience and understanding.

1 Like