Creating datasets from scratch tutorial needed

Nicci_Thomson · August 9, 2023, 3:40am

I have spent the past few months completing accreditations in TensorFlow and am wanting to create a CNN for image classification from scratch. I am not wanting to use an existing data set but am planning on collecting and creating a small dataset myself but don’t know where to start in terms of the correct method to collate and prepare the data. If anybody can recommend a short course or tutorials that would step me through the process I would really appreciate it thanks!

Kiran_Sai_Ramineni · August 9, 2023, 9:06am

Hi @Nicci_Thomson,For collecting images you can capture the images, download them from the internet,etc… Once you have collected images that are suitable for your image classification task you can place them in sub directory with in the main directory the sub directory name should be class name for the images. For example, If I have 2 classes A & B then the directory structure will be

main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg

If you have less images you can perform data augmentation techniques to improve your dataset size.so that you can get generalized results from training.

Now you can use tf.keras.utils.image_dataset_from_directory( ) for creating a train, validation datasets.

For example,

#training
train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

#validation
val_ds = tf.keras.utils.image_dataset_from_directory(data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Before passing those images to model you have normalized those images, if not you have to add a normalization layer to the model.

Now you can train your model with those images.

Thank You!

Topic		Replies	Views
Dataset creation General Discussion datasets , help_request	1	707	January 10, 2024
Question on loading my own data General Discussion datasets , keras , help_request	2	936	January 30, 2022
Dataset for model TensorFlow datasets , help_request	1	840	November 22, 2022
Problems following CNN tutorial General Discussion datasets , keras , help_request	5	682	June 12, 2023
Data-efficient GANs with Adaptive Discriminator Augmentation General Discussion datasets , help_request	1	1292	May 22, 2023

Creating datasets from scratch tutorial needed

Related topics