I am currently working on a project which uses huggingface. I created the huggingface datasets and converted it to tensorflow. The method of conversion is not from_tensor_slices()
, the one shown in their documentation but using from_generator()
. I found this method a lot faster but at the time of training using TFTrainer(), I encounter an error:
ValueError: The training dataset must have an asserted cardinality
I checked and found the reason was from_generator()
. Inorder to verify this, I created a very basic dataset using from_generator()
method and checked its cardinality:
dumm_ds = tf.data.Dataset.from_generator(lambda: [tf.constant(1)]*1000, output_signature=tf.TensorSpec(shape=[None], dtype=tf.int64))
tf.data.experimental.cardinality(dumm_ds)
Output:
<tf.Tensor: shape=(), dtype=int64, numpy=-2>
where, ‘-2’ mean UNKNOWN_CARDINALITY.
I would like to know whether this is a bug or not? and If not then, how can I change the cardinality?