Dataset from pandas DataFrame that has list

Spencer_Uresk · August 15, 2022, 1:46pm

I am having a bit of a time wrapping my brain around the dataset functionality. I have a csv file that looks like this:

gender,products
M,[1,2]
M,[1,7,5,8]

I normally would create a dataset by doing something like:

tf.data.Dataset.from_tensor_slices(dict(df))

But TF complains about not being able to figure out the type spec from a list or numpy array. I tried to just leave it as a string and convert it in a map function with something like this:

@tf.function
def eval_products(x):
    y = {}
    y.update(x)
    if tf.strings.length(x['products']) > 2:
        arr = tf.strings.split(tf.strings.substr(x['products'], 1, tf.strings.length(x['products'])-2), ',')
        y['products'] = tf.strings.to_number(tf.strings.strip(arr))
    else:
        y['products'] = tf.ragged.constant([-1.])
    return y

ds = ds.map(eval_products)

This works, but feels really messy, and I can’t get it to make products a RaggedTensor instead of a plain Tensor. It seems like I’m missing something really basic, but I am not sure what it is and I can’t find any examples in the docs or anywhere else, so any ideas are appreciated.

Kiran_Sai_Ramineni · November 29, 2023, 12:48pm

Hi @Spencer_Uresk, I have tried to convert the products to ragged tensors with a similar type of code.

def convert_to_ragged(gender, products):
    products = tf.strings.regex_replace(products, "[\[\]']", "")
    split_products = tf.strings.split(products, ',')
    ragged_tensor = tf.RaggedTensor.from_row_lengths(split_products, row_lengths=tf.strings.length(split_products))
    return gender, ragged_tensor

please refer to this gist for complete code. And let us know if it helps. Thank You.

Topic		Replies	Views
Issue with 'split_dataset' and string dataset General Discussion api , keras , tfdata	1	697	May 7, 2023
TF Dataset.window() not returning useful Dataset objects General Discussion tf-train , tfdata , tfragged	2	334	January 18, 2024
tf.data.Dataset.from_generator: How to read data from list into tf.data.Dataset General Discussion datasets , tfdata , help_request	7	6443	March 13, 2022
Dataset related basic doubt General Discussion help_request	1	309	July 9, 2021
Training a CONV network with ragged tensors General Discussion datasets	2	379	July 12, 2023

Dataset from pandas DataFrame that has list

Related topics