Understanding a Tensorflow data structure

Nader_Afshar · August 11, 2023, 3:32am

I am looking at a kaggle code which is building a tf.data.Dataset object and then splits the data that was placed in the object for further processing. I am having a hard time understanding what is happening to the data in the following two steps:

Creating the datset object with a tuple made of data and labels
Splitting the dataset object again into a tuple of (tuple and a list).

Please see below:

def split_labels(x, y):
    return (x[0], x[1]), y

t_dataset = (
    tf.data.Dataset.from_tensor_slices(
        (
            df_train[['premise','hypothesis']].values,
            keras.utils.to_categorical(df_train['label'], num_classes=3)
        )
    )
)

x_preprocessed = t_dataset.map(split_labels)

Do I understand correctly that the only difference between the data structure before the call to split_labels and after is that :

before the call the data structure is a Tuple made up of two Lists
after the call the data structure is a Tuple made of a Tuple and a List?

Thank you

Gelassen · August 14, 2023, 6:48am

I don’t have hands on experience with Keras, but below is my thoughts which you might find helpful.

def split_labels() is high order function aka lambda expression which is applied to each element of origin dataset. .map() operation do this.

from_tensor_slices reduce\modifies dimension of origin dataset. Here is a link on the official documentation. Dataset is indeed a Tuple, but first element of this tuple is 2-dimensional list and second element is 3-dimensional matrix. I can not predict what you will get after this operation, the best choice is to try to run it or implement unit test to understand its behaviour under different conditions.

After split_labels call on each element your final dataset increases its dimension on one element, on one tuple. No matter what was its operand.

Topic		Replies	Views
Dataset related basic doubt General Discussion help_request	1	309	July 9, 2021
Convert TensorFlow PrefetchDataset to a MapDataset General Discussion datasets	3	1750	August 29, 2023
Issue with 'split_dataset' and string dataset General Discussion api , keras , tfdata	1	698	May 7, 2023
TF Dataset.window() not returning useful Dataset objects General Discussion tf-train , tfdata , tfragged	2	336	January 18, 2024
Dataset from pandas DataFrame that has list General Discussion tfdata , help_request	1	1364	November 29, 2023

Understanding a Tensorflow data structure

Related topics