Using TF timeseries_dataset_from_array with more samples

I have to handle a huge amount of samples, where each sample contains unique time series. The goal is to feed this data into the Tensorflow LSTM model and predict some features. I have created the tf timeseries_dataset_from_array generator function to feed the data to the TF model, but I haven’t figured out how to create a generator function when I have multiple samples. If I use the usual pipeline, tf timeseries_dataset_from_array overlap the time series of two individual samples.

Does anyone have an idea how to effectively pass a time series of multiple samples to the TF model?

E.g. the Human Activity Recognition Dataset is one such dataset where each person has a separate long, time series, and each user’s time series can be further parsed with the SLIDING/ROLLING WINDOS-like timeseries_dataset_from_array function.

Here is a simpler example:

I want to use timeseries_dataset_from_array to generate samples for the TF model. Example: sample 1 where column 0 has 0, sample 2 starts where column 0 has 100. Here is a simpler example:

I want to get 3D data (samples, timesteps, features) without overlap.For example (6,2,7) Like this:

Here is the sample code:

from tensorflow.keras.preprocessing import timeseries_dataset_from_array
import numpy as np

x = np.array([[0,1,2,3,4,5,6],
              [0,11,12,13,14,15,16],
              [0,21,22,23,24,25,26],
              [0,31,32,33,34,35,36],
              [0,41,42,43,44,45,46]   
              ])

xx = np.concatenate((x, x+100), axis=0)#.reshape(2,5,6)  
    
sequence_length=2
stride=1
rate=1
input_dataset = timeseries_dataset_from_array(xx,
                                              None,
                                              sequence_length,
                                              sequence_stride=stride,
                                              sampling_rate=rate)

x_test = np.concatenate([x for x in input_dataset], axis=0)
1 Like

I have figured out the solution…

### solution ######            
  
# reshape the dataset          
xxx = xx.reshape(2,5,7)  

# create two tensorflow datasets
input_dataset0 = timeseries_dataset_from_array(xxx[0,:,:],
                                                      None,
                                                      sequence_length,
                                                      sequence_stride=stride,
                                                      sampling_rate=rate)
input_dataset1 = timeseries_dataset_from_array(xxx[1,:,:],
                                                      None,
                                                      sequence_length,
                                                      sequence_stride=stride,
                                                      sampling_rate=rate)
# concatenate the two tensorflow datasets
input_dataset = input_dataset0.concatenate(input_dataset1)

x_test = np.concatenate([x for x in input_dataset], axis=0)
1 Like

I’ve been looking for help on this question for a while now, and your solution helped me immensely. Thank you for taking the time to post your solution!