It seems like when passing videos without an audio stream to Gemini 3 models, it fails unless MEDIA_RESOLUTION_HIGH is set. Videos with an audio stream work in all cases, and Gemini 2.5 doesn’t seem to have this issue. Minimal repro below:
import os
import subprocess
import time
from pathlib import Path
import google.genai as genai
import google.genai.types as genai_types
VIDEO_URL = "https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/360/Big_Buck_Bunny_360_10s_1MB.mp4" # video with no audio stream
VIDEO_PATH_NO_AUDIO = Path.cwd() / "noaudio.mp4"
VIDEO_PATH_AUDIO = Path.cwd() / "withaudio.mp4"
if not VIDEO_PATH_NO_AUDIO.exists():
subprocess.run(["curl", "-L", "-o", str(VIDEO_PATH_NO_AUDIO), VIDEO_URL], check=True)
if not VIDEO_PATH_AUDIO.exists():
subprocess.run([
"ffmpeg", "-i", str(VIDEO_PATH_NO_AUDIO),
"-f", "lavfi", "-i", "anullsrc=channel_layout=stereo:sample_rate=48000",
"-c:v", "copy", "-c:a", "aac", "-shortest",
str(VIDEO_PATH_AUDIO)
], check=True) # make video with an audio stream
client = genai.Client(
api_key=os.environ["GEMINI_API_KEY"],
http_options=genai_types.HttpOptions(api_version="v1beta", timeout=600_000),
)
def test_video(video_path: Path, model: str, media_resolution: str | None):
print(f"=== Testing {video_path.name} with {model} and resolution {media_resolution}")
uploaded = client.files.upload(file=video_path)
while uploaded.state != genai_types.FileState.ACTIVE:
time.sleep(2)
assert uploaded.name
uploaded = client.files.get(name=uploaded.name)
try:
response = client.models.generate_content(
model=model,
contents=[{
"role": "user",
"parts": [
{"file_data": {"file_uri": uploaded.uri, "mime_type": uploaded.mime_type}},
{"text": "What is happening in this video?"},
],
}],
config={
"media_resolution": media_resolution,
} # type: ignore
)
print(f"Response: {response.text}")
except Exception as e:
print(f"Error: {str(e)}")
print()
test_video(VIDEO_PATH_AUDIO, "gemini-3-flash-preview", None) # success
test_video(VIDEO_PATH_NO_AUDIO, "gemini-3-flash-preview", None) # Error: 404 NOT_FOUND
test_video(VIDEO_PATH_NO_AUDIO, "gemini-3-flash-preview", "MEDIA_RESOLUTION_HIGH") # success
test_video(VIDEO_PATH_NO_AUDIO, "gemini-2.5-flash", None) # success
And my output from a run of that:
=== Testing withaudio.mp4 with gemini-3-flash-preview and resolution None
Response: A rabbit comes out of its burrow in a green forest and starts cleaning its ears and fur.
=== Testing noaudio.mp4 with gemini-3-flash-preview and resolution None
Error: 404 NOT_FOUND. {'error': {'code': 404, 'message': 'Requested entity was not found.', 'status': 'NOT_FOUND'}}
=== Testing noaudio.mp4 with gemini-3-flash-preview and resolution MEDIA_RESOLUTION_HIGH
Response: In this video, a rabbit emerges from its burrow beneath a large tree in a sun-drenched forest clearing.
Initially, the rabbit is seen peeking its head out from the dark entrance of the burrow. It then cautiously steps out onto the lush, green grass surrounding the base of the tree. After taking a moment to look around, the rabbit begins to hop away across the grassy mound and further into the forest.
=== Testing noaudio.mp4 with gemini-2.5-flash and resolution None
Response: The video displays a serene and vibrant animated forest scene.
In the foreground, a lush green, moss-covered hillock features a large tree with prominent, gnarled roots. At the base of the tree, a dark opening suggests a cave or burrow entrance. The hillock is surrounded by vibrant green grass, with a few scattered rocks visible. Tall, verdant trees form a dense background, creating a rich woodland atmosphere.
The primary action in the video is the subtle, gradual shift of sunlight filtering through the leaves and across the grassy terrain. This movement creates dynamic patterns of light and shadow that slowly evolve over the short clip, suggesting the gentle passage of time in the forest. Small purple or red specks, possibly berries or tiny flowers, also appear and subtly shift position on the mossy mound, further enhancing the sense of a living environment. The camera remains mostly static throughout the clip.