imagnet dataset consists of 1000 class which is listed in subdirectories. we can load all the 1000 class at once with tf.keras.utils.load_dataset_from_directory() which load all the 1000 class into one tf.data.Datset API
now suppose instead of loading all 1000 class I need to load the first 100 only what should I do ?
Hello,
I’d be glad to help you with loading a specific subset of classes from the ImageNet dataset using TensorFlow’s tf.keras.utils.load_dataset_from_directory Modified by moderator function. Here’s a refined approach that incorporates filtering and efficiency:
- Import Necessary Libraries:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
Define Class Subset:
Create a list containing the names of the 100 classes you want to load.
Hi @M_Akrm ,
import tensorflow as tf
# Assuming 'directory' is the path to the ImageNet dataset directory
directory = '/path/to/imagenet'
# Get the list of subdirectories (classes)
subdirs = sorted(os.listdir(directory))
# Select the first 100 subdirectories or Even you can mention your desired class names as well from imageNet.
selected_subdirs = subdirs[:100] or ["class1", "class2", ..., "class100"]
# Load the first 100 classes into a tf.data.Dataset
tf.keras.preprocessing.image_dataset_from_directory(
directory,
labels='inferred',
label_mode='int',
class_names=selected_subdirs,
color_mode='rgb',
batch_size=32,
image_size=(256, 256),
shuffle=True,
seed=None,
validation_split=None,
subset=None,
interpolation='bilinear',
follow_links=False,
crop_to_aspect_ratio=False,
pad_to_aspect_ratio=False,
data_format=None,
verbose=True
)
for more information you can go through the Tensorflow preprocessing/image_dataset_from_directory.
Thanks.
You can filter the class labels while using “load_dataset_from_directory” or create a list of the 10 desired classes to load!
i tried this method how ever imageDataGenerator doesn’t drop the last batch if its not completed. which give me error in my custom training
ValueError: The class_names
passed did not match the names of the subdirectories of the target directory. Expected: [‘n01440764’, ‘n01443537’, ‘n01484850’, ‘n01491361’, ‘n01494475’, ‘n01496331’, ‘n01498041’, ‘n01514668’, ‘n01514859’, ‘n01518878’, ‘n01530575’, ‘n01531178’, …]
but received: [‘n01440764’, ‘n01443537’, ‘n01484850’, ‘n01491361’, ‘n01494475’, ‘n01496331’, ‘n01498041’, ‘n01514668’, ‘n01514859’, ‘n01518878’]
it looks like class_names should have all the 1000 folder
i tried the above code which i believe do the same but it gives a value error since the class_names argument should have the names of all 1000 class
You’re right, class_names in load_dataset_from_directory expects all class labels. To load only the first 100 classes, you can filter after creating the dataset:
# Load the full dataset
dataset = tf.keras.utils.image_dataset_from_directory(
data_dir, # Path to your image directory
shuffle=False # Keep order for easy filtering
)
# Get a list of the first 100 class names
first_100_classes = dataset.class_names[:100]
# Filter the dataset to only include those classes
filtered_dataset = dataset.filter(lambda x, y: y in first_100_classes)
This approach avoids the value error and lets you work with just the desired classes. Just like filtering hundreds of Jedo door handles images to focus on a specific design category, here we’re filtering the dataset to specific classes.