TensorFlow with shape=<unkown> after tf.data.Dataset.from_generator

I’m trying to generate a tensor from a dataset of the following format:


    [
    ([[101, 4640, 8684, 2443, 3874, 5772, 6388, 1280, 102], [1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0]], 1),
    ([[101, 4102, 293, 3718, 249, 598, 5772, 6388, 1280, 102], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 0), 
    ([[101, 169, 1382, 2534, 5772, 6388, 1280, 5457, 20073, 102], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 0)
    ,....


    all_dataset = tf.data.Dataset.from_generator(lambda: sorted_all,
                                                     output_types=(tf.int32, tf.int32))

My all_dataset has the following format


    <_FlatMapDataset element_spec=(TensorSpec(shape=<unknown>, dtype=tf.int32, name=None), TensorSpec(shape=<unknown>, dtype=tf.int32, name=None))>

And I need to pass this all_dataset to a function in the sequence


     all_batched = all_dataset.padded_batch(BATCH_SIZE,
                                               padded_shapes=((3, None), ()),
                                               padding_values=(0, 0))

all_batched in turn returns a tensor with None which breaks my application.


    <_PaddedBatchDataset element_spec=(TensorSpec(shape=(None, 3, None), dtype=tf.int32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>

I’m using tensorflow in Version: 2.12.1. And downgrading to previous versions is not an option in this project. Does anyone have a viable solution for this case?

Hi @Rhaymison_Cristian, If you don’t pass the output_signature argument in from_generator method the shape will be unknown. For example

dataset = tf.data.Dataset.from_generator(data_generator, output_types=(tf.float32, tf.int32))

dataset.element_spec
#output
(TensorSpec(shape=<unknown>, dtype=tf.int32, name=None),
 TensorSpec(shape=<unknown>, dtype=tf.int32, name=None))

If you pass the shape of the data which you are passing to from_generator the element_spec gives the shape. Also please note that the shape should be matched with the shape of the input given to the generator for avoiding further issues.

dataset = tf.data.Dataset.from_generator(data_generator, output_signature=(tf.TensorSpec(shape=(2,), dtype=tf.int32)))

dataset.element_spec
#output
TensorSpec(shape=(2,), dtype=tf.int32, name=None)

Thank You.

1 Like

@Kiran_Sai_Ramineni
Thanks for the feedback. By making the change you informed me, I made progress. However, when I go to the method:

BATCH_SIZE = 32
all_batched = all_dataset.padded_batch(BATCH_SIZE,
                                            padded_shapes=((2, None), ()),
                                            padding_values=(0, 0))

I get the following error:

TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: 'ndarray'.

Note: Just to give you a little context. I’m at the end of this process trying to do a class analysis with DCNNBERTEmbedding.

And precisely these shapes with None result in a final error:

  Call arguments received by layer 'dcnn' (type DCNNBERTEmbedding):
       • inputs=tf.Tensor(shape=(None, 2), dtype=int32)
       • training=True

Thank you in advance.

Hi @Rhaymison_Cristian, Instead of passing padded_shapes=((2, None), ()), could you please try to pass the shape as dictionary mapping like padded_shapes={'x': [2, ], 'y': [None]}). Thank You.