I am looking at a kaggle code which is building a tf.data.Dataset object and then splits the data that was placed in the object for further processing. I am having a hard time understanding what is happening to the data in the following two steps:
- Creating the datset object with a tuple made of data and labels
- Splitting the dataset object again into a tuple of (tuple and a list).
Please see below:
def split_labels(x, y):
return (x[0], x[1]), y
t_dataset = (
tf.data.Dataset.from_tensor_slices(
(
df_train[['premise','hypothesis']].values,
keras.utils.to_categorical(df_train['label'], num_classes=3)
)
)
)
x_preprocessed = t_dataset.map(split_labels)
Do I understand correctly that the only difference between the data structure before the call to split_labels and after is that :
before the call the data structure is a Tuple made up of two Lists
after the call the data structure is a Tuple made of a Tuple and a List?
Thank you