Hi is there a way to convert a prefetchdataset to a mapdataset (which is the default dataset type in tensorflow and also mostly used). I have a TF prefetch dataset as i have used tf.data.experimental.make_csv_dataset() to directly load a csv dataset into a tensorflow dataset. As the data is large i did not want to convert to pandas/list/numpy before converting to tensorflow.
As most functions support Mapdataset , i want to convert my prefetch dataset.
My dataset consists to multiple string and int columns and i have a requirement to sort the dataset on 2 int columns .
Hi @Sudh_Kumar, The dataset type will be how the dataset was created. For example,
If the dataset was created using dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
then the type of dataset will be
tensorflow.python.data.ops.from_tensor_slices_op._TensorSliceDataset
If the dataset was created using dataset = tf.data.Dataset.range(10)
then the type of dataset will be
tensorflow.python.data.ops.range_op._RangeDataset
The Prefetch dataset will be created when we apply the .prefetch method on the dataset. For example,
dataset = tf.data.Dataset.range(10)
prefetch_dataset = dataset.prefetch(3)
type(prefetch_dataset)
#output: tensorflow.python.data.ops.prefetch_op._PrefetchDataset
To convert prefetch to mapdataset apply .map method on the prefetch_dataset. For example,
map_dataset= prefetch_dataset.map(lambda x: x)
type(map_dataset)
#output: tensorflow.python.data.ops.map_op._MapDataset
Thank You!
Hi , Kiran thanks a lot for your detailed reply.
I have a csv dataset of 7-8 columns and hence i used tf.data.experimental.make_csv_dataset() to create a tensorflow dataset and all the data is stored in ordered dictionaries.
Hi @Sudh_Kumar, Once you have created a dataset using tf.data.experimental.make_csv_dataset() then the dataset type will be tensorflow.python.data.ops.prefetch_op._PrefetchDataset
. Now you can apply map function to convert it to tensorflow.python.data.ops.map_op._MapDataset
. For example, i have created a dataset using
titanic_file_path = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
titanic_csv_ds = tf.data.experimental.make_csv_dataset(
titanic_file_path,
batch_size=5, # Artificially small to make examples easier to show.
label_name='survived',
num_epochs=1,
ignore_errors=True,)
type (titanic_csv_ds)
#output: tensorflow.python.data.ops.prefetch_op._PrefetchDataset
Now if i apply a map on titanic_csv_ds
def map_function(features, label):
return features, label
map_dataset= titanic_csv_ds.map(map_function)
type(map_dataset)
#output: tensorflow.python.data.ops.map_op._MapDataset
Thank You!