Convert TensorFlow PrefetchDataset to a MapDataset

Hi is there a way to convert a prefetchdataset to a mapdataset (which is the default dataset type in tensorflow and also mostly used). I have a TF prefetch dataset as i have used tf.data.experimental.make_csv_dataset() to directly load a csv dataset into a tensorflow dataset. As the data is large i did not want to convert to pandas/list/numpy before converting to tensorflow.
As most functions support Mapdataset , i want to convert my prefetch dataset.
My dataset consists to multiple string and int columns and i have a requirement to sort the dataset on 2 int columns .

Hi @Sudh_Kumar, The dataset type will be how the dataset was created. For example,

If the dataset was created using dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3]) then the type of dataset will be

tensorflow.python.data.ops.from_tensor_slices_op._TensorSliceDataset

If the dataset was created using dataset = tf.data.Dataset.range(10) then the type of dataset will be

tensorflow.python.data.ops.range_op._RangeDataset

The Prefetch dataset will be created when we apply the .prefetch method on the dataset. For example,

dataset = tf.data.Dataset.range(10)
prefetch_dataset = dataset.prefetch(3) 
type(prefetch_dataset) 
#output: tensorflow.python.data.ops.prefetch_op._PrefetchDataset

To convert prefetch to mapdataset apply .map method on the prefetch_dataset. For example,

map_dataset= prefetch_dataset.map(lambda x: x)
type(map_dataset) 
#output: tensorflow.python.data.ops.map_op._MapDataset

Thank You!

Hi , Kiran thanks a lot for your detailed reply.

I have a csv dataset of 7-8 columns and hence i used tf.data.experimental.make_csv_dataset() to create a tensorflow dataset and all the data is stored in ordered dictionaries.

Hi @Sudh_Kumar, Once you have created a dataset using tf.data.experimental.make_csv_dataset() then the dataset type will be tensorflow.python.data.ops.prefetch_op._PrefetchDataset. Now you can apply map function to convert it to tensorflow.python.data.ops.map_op._MapDataset. For example, i have created a dataset using

titanic_file_path = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")

titanic_csv_ds = tf.data.experimental.make_csv_dataset(
    titanic_file_path,
    batch_size=5, # Artificially small to make examples easier to show.
    label_name='survived',
    num_epochs=1,
    ignore_errors=True,)

type (titanic_csv_ds) 
#output: tensorflow.python.data.ops.prefetch_op._PrefetchDataset

Now if i apply a map on titanic_csv_ds

def map_function(features, label):
    return features, label

map_dataset= titanic_csv_ds.map(map_function)

type(map_dataset)
#output: tensorflow.python.data.ops.map_op._MapDataset

Thank You!