Veo3 cuts off head

I tried to generate a video with a prompt and an image through the VertexAI API, and VEO3 cuts off the upper part of the person’s body. However, when I generate the video using Google Gemini, VEO3 preserves the entire person on the screen.
I conducted an experiment where I sent a request consisting of only one image (without a text prompt). VEO3 from Google Gemini generates a person at a full scale, but if I send the same request through the VertexAI API, VEO3 cuts off the person’s head and neck.
Yes, I am using a negative prompt and explicitly specifying that I want to see the entire person in the video from start to finish.
What do you suggest I do?
P.S.: VEO2 does not have this problem with VertexAI.

Hi @Oleksii_Dovhan
Could you please share the reproducible code, image and the exact prompt so I can run it on my end?If possible please share link to the video output that exhibits the cropping problem
Thank you

link to google colab with code: Google Colab

prompt text: Cinematic full body shot of a male model performing a slow, elegant 180-degree spin to showcase a modern designer outfit. He is on a runway with bright studio lighting. High detail, 4K, photorealistic, glamorous atmosphere.

negative prompt: cropped head, cropping, cut off head, out of frame, close-up, medium shot, portrait, tight shot, partial view, incomplete body, changing race, changing ethnicity, changing skin tone, different person, different face, inconsistent features, deformed, blurry, low quality, bad anatomy, disfigured

Generated video could be seen in the google colab.

input image:

I believe the prompts might be the issue. I’ve successfully generated a video with the model’s face using these two specific prompts. Could you please try them and let me know if you get the same result?
PROMPT = “Cinematic wide-angle shot of a handsome male model performing a slow, elegant 180-degree spin on a fashion runway. The model maintains eye contact with the camera throughout the rotation, turning his body while keeping his face partially angled toward the viewer. Professional runway lighting with key lights focused on his face to ensure clear facial visibility during the entire spin. The model showcases a modern designer outfit with confident, professional demeanor. High detail, 4K, photorealistic, glamorous fashion show atmosphere. Camera remains stationary at eye level capturing full body while maintaining clear facial features throughout the movement.”
NEGATIVE_PROMPT = “face turning away from camera, back of head, profile view blocking face, shadows on face, dark lighting on face, cropped head, cut off head, out of frame, close-up, medium shot, tight crop, partial body view, changing facial features, different person, inconsistent appearance, deformed, blurry face, low quality, bad anatomy, obscured face, hair covering face”

I used your prompt and negative prompt and got a video where the model has a head, but VEO3 starts generating from the chest and only later adds the head. However, it does not preserve the face, hair, etc.

In contrast, VEO3 from Google Gemini saves the face’s features and shows the model at its full height from the start of the video.
Here is the link to the colab with the generated video.

Check this out
PROMPT = “Start with a close-up establishing shot of a handsome male model’s face and upper body, then pull back to reveal full body. The model has [specific hair description - e.g., dark brown styled hair] and [specific facial features - e.g., strong jawline, brown eyes]. He performs a slow, elegant 180-degree spin on a fashion runway while maintaining his exact facial features and hairstyle throughout. His face remains clearly visible and consistently lit during the entire rotation, with the same hair texture and facial structure from start to finish. Professional runway lighting ensures facial continuity. Modern designer outfit, confident demeanor. High detail, 4K, photorealistic, glamorous fashion show atmosphere.”
NEGATIVE_PROMPT = “changing facial features, different hair texture, hair color change, facial morphing, inconsistent face, face appearing later, headless body, body without head, generating body first then head, delayed head appearance, changing hair length, different hairstyle mid-video, face transformation, morphing features, inconsistent lighting on face, shadows obscuring face, face turning completely away, back of head only, cropped head, cut off features, blurry face transition, low quality face generation”

this prompt variation is better. The video begins from upper part of the body and smoothly revealing the entire person.
But the face, hair is completely changed, it does not infer the information from image.

I think the face and the hair structure can be controlled in the negative prompt . Please check once