Custom sampler inside a `tf.data` pipeline

Sayak_Paul · April 23, 2021, 5:03am

Hi folks.

I have a use case on binary segmentation i.e. the per-pixel categories can only be either of the two given classes. The presence of these classes inside the training images is skewed. This essentially relates to a class imbalance problem but in a 3D space which is a bit complicated to handle.

So, instead of setting the sample_weight (which is recommended to deal with this problem), I did some research and found the following to be a pretty elegant way of dealing with the problem. When feeding a batch of samples to the model, always ensure the number of images containing the positive class is beyond a prefixed ratio.

The ground-truth segmentation masks contain 0’s and/or 1’s. One way to ensure that a mask has some presence of the positive class is to compare its mean. For masks containing no positive class pixels, will have a mean of 0.

I am looking for snippets/pointers/approaches on how to realize this inside a tf.data pipeline.

This is a tried and tested method (see here and here).

Topic		Replies	Views
Randomly sampling equal points ensuring equal number per class General Discussion tfdata	18	3819	July 28, 2022
Balance datasets for multi-class semantic segmentation TensorFlow datasets , help_request	2	1524	November 22, 2022
Splitting dataset into train, validate, test and ensuring equal representation of classes TensorFlow models , datasets , help_request	2	2042	April 7, 2023
How to do Minority class sampling using tensorflow? General Discussion tfdata , help_request	1	1114	June 13, 2021
Proper use of Keras ImageDataGenerator: Create Masks for Segmentation and sample_weight parameter Keras datasets , help_request	2	4869	September 8, 2021

Custom sampler inside a `tf.data` pipeline

Related topics