Hi, I am fine-tuning gemini-2.5-flash with video data. My fine-tuning job say v0 succeeded without any errors for a dataset with over 2800 videos (7235244 tokens). But subsequent runs with the same training and validation dataset is failing with this error:
”Dataset example 2434 of 2816 contains a URI to a video file with invalid binary data. Please check your URIs. Example snippet: [A line from the training JSONL]”
The dataset example number (shown above) is different each time. All video files are valid (The exact dataset ran fine previously). I tested with a smaller training set (~450 videos) which worked. Can I get any advice on how to trace why it doesn’t work with a larger dataset which previously worked?
4 Likes
I am experiencing the same issue from last week. We able to fine tune 10k+ plus vidoes and now same job failing again.
1 Like
@Venkat_31 Did you able to get resolution?
@Dharamendra_Kumar the old jobs aren’t working. I’m trying to batch the larger data into smaller ones and tune the same model iteratively. Do you have an alternative?
1 Like
This is the approach we followed however tuning iteratively causing a lot time.
@Dharamendra_Kumar do you have any recommended approaches apart from continuous tuning or get any support from Google’s side to run tuning jobs for larger datasets?
We are facing the same problem. Restricted the fine tuning to files between the size 1MB and 100 MB but still seeing the same issue. Not sure how to proceed on this - iteratively fine tuning is not a process that can scale for my use case