How to prepare data for multi-label classification with Bert?

Martin · June 26, 2021, 8:42am

If I want to do a multi-label text classification task, not multi-class classification, and my data is in this format:
1 this is a test. 0,0,1,0
2 this is another test 0,1,1,1
3 one more test 1,0,0,1

How should I prepare my data so that Keras preprocessing API can easily create TF.DataSet from it? For single label classification, I can use this format (one file directory per class) as below from the Keras/TF tutorial. But if my task is multi-label classification, how should I go about this and make tf.keras.preprocessing.text_dataset_from_directory still works with my data?

raw_train_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='training',
    seed=seed)

class_names = raw_train_ds.class_names
train_ds = raw_train_ds.cache().prefetch(buffer_size=AUTOTUNE)

Bhack · June 26, 2021, 4:31pm

You cannot use this directly for this kind of multi labels.

See this example, also if It is for images It Is quite the same:

Topic		Replies	Views
Tensorflow dataset has () shape General Discussion models , nlp , datasets , help_request	1	2308	May 12, 2022
Build multiclass and multilabel tf.dataset to train Bert General Discussion datasets , help_request	1	755	January 31, 2024
Mapping strings to ints during preprocessing stage General Discussion models , datasets , help_request	1	2239	November 16, 2021
Having issue with tf.keras.preprocessing.image_dataset_from_directory General Discussion api , keras , help_request	5	2204	January 19, 2022
Neural network has six inputs and one output, how to load image for training? General Discussion tf-c-api , tfdata , model-training	3	179	March 9, 2024

How to prepare data for multi-label classification with Bert?

Related topics