Multi-Image Input Support for Veo 3.1 API Video Generation

Hello,

I’m currently experimenting with Veo 3.1 for video generation.

Using the Gemini API (google-genai), I’ve confirmed that video generation works with a single reference image and a prompt. However, my goal is to input multiple images (e.g., first and last frames or a small set of keyframes) to create a continuous video sequence, and this does not appear to be supported by the Python SDK.

I’d like to clarify the following points:

  1. Is multi-image input officially supported when accessing Veo 3.1 via the Gemini API, specifically to create a video sequence (e.g., start/end frames or keyframes)?
    If not, is this a limitation of the model itself or of the current SDK?

  2. Does the Vertex AI API expose additional Veo 3.1 capabilities—such as first/last frame interpolation or multi-image input for generating a video sequence—that are not currently available through the Gemini API or Python SDK?

Any clarification on the official support status and recommended integration patterns would be greatly appreciated.

Thank you for your time and support.

Best regards,
Khizar