Hi,
I want to use tf.data.Dataset
as main building block in my data pipeline for training a neural network with Tensorflow that deals with time series. Ideally without resorting to custom dataloader classes.
Question: How do you perform processing that requires more than what Tensorflow can express? I.e. operations that require for example Numpy input and therefore cannot be integrated in the Tensorflow graph.
Example: Given time series data, I would like to resample the data to be able to use time series data from different sources in a single training dataset. How can that be achieved?
The reasoning behind the pipeline-integrated transformations are that those transformations only take around 10min on the whole dataset that I use. Hence, I am happy to perform them prior to training instead of deriving a dedicated dataset once.
I am aware of similar questions here (like this). Also, I am aware of Tensorflow Transform and Keras preprocessing layers. None of those options allow for example interpolation. There exists a TF implementation for interpolation - but that only works on an equidistant grid unfortunately. An interesting implementation of interpolation in TensorFlow is this one; however, I would much prefer to existing implementations in SciPy or NumPy.
What is your workflow to implement preprocessing steps that are easy with NumPy and alike if one-time performance is not crucial? Maybe using a custom dataloader is in fact easier than relying on tf.data.Dataset
for those preprocessing steps?
Thanks and best wishes!