Hi,
Supposing I have a tf.data.Dataset
of rank-1 tensors of varying lengths, say up to 100. I would like to create a new dataset in which each element is a 1-d stack of a varying number of consecutive elements from the first dataset, totalling less than length 200.
I know I can do this using tf.data.Dataset.from_generator
, but is there a better way?
Thanks in advance,
Henry
Hi @hrbigelow ,
You can achieve this using tf.data.Dataset.window()
followed by flat_map()
.
- Use
window(
) to create sliding windows of consecutive elements.
- Set
window(
)'s size parameter to a large value (e.g. 10) and shift=1
- Apply
flat_map()
to each window .
- Inside
flat_map()
, Stack the tensor and filter based on total length<200
This method is more efficient as it leverages TensorFlow’s built-in dataset operations, potentially offering better performance and parallelism.
Thank You .