I have a tf.data.dataset that contains features and a probability. (I created the dataset by zipping my test dataset with the probabilities predicted by my binary classification model, thereby adding a probability “column” to the test dataset.)
I want to sort this dataset in descending order by probability. Can I do so directly, without resorting to converting the dataset to numpy or a pandas dataframe?
Thanks, but is there really no way of working more directly on a tf.data.dataset and thereby maintain the lazy evaluation, caching, and consistency that datasets afford?
Being able to do something similar with a tf.data.dataset would be convenient and potentially not require loading all the data into memory at the same time (or not require loading and sorting it until it is actually used).
Hi @Sudh_Kumar, Once you have created a dataset with features and probability by using tf.data.Dataset.zip you can sort the dataset based upon the probabilities using the below code line.
#instead of 1 you have to change it according to the probability column in your dataset
sorted_dataset = sorted(dataset, key=lambda x: x[1]
please refer to this gist for working code example. Thank You.
Hi Kiran, thanks for the reply.
But i am unable to use this as my TensorFlow Dataset is a prefetchDataset as i have use make_csv_dataset() function to read csv directly to tf dataset as it is a large dataset and i did not want to convert to pandas, numpy or list .
So is there a way i can sort my prefetch tensorflow dataset.