How to load big image segmentation dataset

zoythum · November 17, 2022, 10:35am

I have a U-Net model that I would like to train on a big dataset of images stored on my local machine. Unfortunately it is not possible to load the dataset directly into memory since it is too big, moreover is also not possible to store the images in a way that allows the usage of image_dataset_from_directory since each pixel of the image must be associated to a different class.
What is the correct way to load the dataset and train the model? How can I create an appropriate dataset?

The dataset is stored in four directories, two for labels (train, test) and two for sources. Each of the four directories contains a set of folders named with unique ids containing either a npy image or a tif label.

The structure is show here:
train_labels > id_1 > label.tif

train_images > id_1 > image.npy

test_labels > id_2 > label.tif

test_images > id_2 > image.npy

Laxma_Reddy_Patlolla · November 18, 2022, 9:06pm

Hi @zoythum,

I would suggest you to use tfrecords. Please find the documentation here. The TFRecord format is a simple format for storing a sequence of binary records.

Lance_N · November 30, 2022, 5:13am

The Oxford Pets Dataset might be a match for the structure of your images and labels. You can look at how the raw data is organized, and how the Tensorflow Dataset configuration files present the samples.

SozoPumpkin · December 2, 2022, 2:17pm

It can be quite tedious for a technical point of view as we don’t have good image data exploration tool.

Topic		Replies	Views
How to fit large dataset to model? General Discussion models , datasets , help_request	7	7806	September 11, 2021
Train a model on large dataset TensorFlow models , datasets , keras , model-garden , help_request	9	1613	June 20, 2023
How can I load many image files with Tensorflow General Discussion api , keras , tfdata , help_request	1	592	August 22, 2023
Recommended way to save/load data to/from disk to tf.data.Dataset General Discussion tfdata	7	4343	July 19, 2023
Tensor flow Dataset Input Pipelines General Discussion datasets	1	382	February 28, 2024

How to load big image segmentation dataset

Related topics