I have some image data for three classes. There might be some mislabeled data also in that dataset. I want to filter out the mislabeled one as much as possible. I am thinking about applying Majority Filtering method. The algorithm is like:
- I take a subset of the training data.
- Train multiple classifiers with the same subset of data.
- Predict on a different subset using all the classifiers.
- If majority of the classifiers fail to predict a label correctly, I tag it as mislabel and don’t consider it in the next iteration.
Now, I am having trouble managing all those data. I used image_dataset_from_directory method to import all the images. But during the training period the data are shuffled. So, I can’t keep track of which are mislabeled or correctly labeled. Also, I am not sure how to eliminate the mislabeled ones in the next training loop.