Hi y’all. I’m trying to implement a time-series prediction model, and my current implementation was highly input-bounded. To be more specific, what I want sorta resembles
keras.utils.timeseries_dataset_from_array
However, I need to design a custom index retrieval process, i.e., only a subset of indices are valid, and I could only sample data pairs like (X[i-seq_len+1:i], y[i])
for all i
in valid indices. Plus, I find this implementation too slow, resulting the majority part of time spent on retrieving (iterating) data. So I need a more efficient implementation.
Therefore, I leveraged tf.data.Dataset.interleave
API to my dataset, and got a decent performance improve immediately. However, when I watched trace viewer from Tensorflow Profiler, I noticed that there were only 5 threads in tf_data_private_threadpool
.
Moreover, I tried the following tricks:
tf.config.threading.set_inter_op_parallelism_threads(16)
tf.config.threading.set_intra_op_parallelism_threads(16)
options = tf.data.Options()
options.threading.private_threadpool_size = NUM_PARALLELS
train_dataset = train_dataset.with_options(options)
but nothing became better.
Moreover, here is the step time of my current implementation:
and here is the result of trace viewer:
It seems that there are only 5 threads responsible for generating data. However, the throughput of interleave
process is set to be 16 (which also equals NUM_PARALLELS
in the above code).
I don’t know how to increase the number of threads. I’ve tried to set some parameters, but nothing has changed.
Could anyone help me to increase the parallelism of tf.data pipeline?