TensorFlow with shape=<unkown> after tf.data.Dataset.from_generator

Rhaymison_Cristian · September 22, 2023, 4:10am

I’m trying to generate a tensor from a dataset of the following format:


    [
    ([[101, 4640, 8684, 2443, 3874, 5772, 6388, 1280, 102], [1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0]], 1),
    ([[101, 4102, 293, 3718, 249, 598, 5772, 6388, 1280, 102], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 0), 
    ([[101, 169, 1382, 2534, 5772, 6388, 1280, 5457, 20073, 102], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 0)
    ,....


    all_dataset = tf.data.Dataset.from_generator(lambda: sorted_all,
                                                     output_types=(tf.int32, tf.int32))

My all_dataset has the following format


    <_FlatMapDataset element_spec=(TensorSpec(shape=<unknown>, dtype=tf.int32, name=None), TensorSpec(shape=<unknown>, dtype=tf.int32, name=None))>

And I need to pass this all_dataset to a function in the sequence


     all_batched = all_dataset.padded_batch(BATCH_SIZE,
                                               padded_shapes=((3, None), ()),
                                               padding_values=(0, 0))

all_batched in turn returns a tensor with None which breaks my application.


    <_PaddedBatchDataset element_spec=(TensorSpec(shape=(None, 3, None), dtype=tf.int32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>

I’m using tensorflow in Version: 2.12.1. And downgrading to previous versions is not an option in this project. Does anyone have a viable solution for this case?

Kiran_Sai_Ramineni · September 22, 2023, 6:27am

Hi @Rhaymison_Cristian, If you don’t pass the output_signature argument in from_generator method the shape will be unknown. For example

dataset = tf.data.Dataset.from_generator(data_generator, output_types=(tf.float32, tf.int32))

dataset.element_spec
#output
(TensorSpec(shape=<unknown>, dtype=tf.int32, name=None),
 TensorSpec(shape=<unknown>, dtype=tf.int32, name=None))

If you pass the shape of the data which you are passing to from_generator the element_spec gives the shape. Also please note that the shape should be matched with the shape of the input given to the generator for avoiding further issues.

dataset = tf.data.Dataset.from_generator(data_generator, output_signature=(tf.TensorSpec(shape=(2,), dtype=tf.int32)))

dataset.element_spec
#output
TensorSpec(shape=(2,), dtype=tf.int32, name=None)

Thank You.

Rhaymison_Cristian · September 22, 2023, 10:01am

@Kiran_Sai_Ramineni
Thanks for the feedback. By making the change you informed me, I made progress. However, when I go to the method:

BATCH_SIZE = 32
all_batched = all_dataset.padded_batch(BATCH_SIZE,
                                            padded_shapes=((2, None), ()),
                                            padding_values=(0, 0))

I get the following error:

TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: 'ndarray'.

Note: Just to give you a little context. I’m at the end of this process trying to do a class analysis with DCNNBERTEmbedding.

And precisely these shapes with None result in a final error:

  Call arguments received by layer 'dcnn' (type DCNNBERTEmbedding):
       • inputs=tf.Tensor(shape=(None, 2), dtype=int32)
       • training=True

Thank you in advance.

Kiran_Sai_Ramineni · September 22, 2023, 10:18am

Hi @Rhaymison_Cristian, Instead of passing padded_shapes=((2, None), ()), could you please try to pass the shape as dictionary mapping like padded_shapes={'x': [2, ], 'y': [None]}). Thank You.

Topic		Replies	Views
Hey guys I do need you! - For some reason, my model is getting a shape error! General Discussion datasets , help_request	5	5419	November 8, 2022
<BatchDataset shapes: ((None, 32, 32, 3), (None,)), types: (tf.float32, tf.int64)> General Discussion datasets , tfdata , help_request	2	2325	July 19, 2024
Dataset map function returns wrong tensor shape TensorFlow datasets , help_request	3	364	September 11, 2023
tf.data.Dastaset.from_generator has error when having multiple input General Discussion datasets , tfkeras , tfdata , model_maker	2	345	January 12, 2024
Addressing Shape Mismatch Error in TensorFlow Code for (None, 224, 224, 3) vs. (TensorSpec(shape=(None, None, 224, 224, 3)) Shape: Troubleshooting and Resolution General Discussion datasets , tfkeraslayer , tfmodel	7	729	January 26, 2024

TensorFlow with shape=<unkown> after tf.data.Dataset.from_generator

Related topics