Timeseries_dataset_from_array returns future samples instead of previous

Victor_Soby · November 5, 2021, 3:05am

I have a dataframe in my ML project it is in the following format initially.

        Feature 1 | Feature 2| Feature 3 | Feature 4 | etc.
Time 1
Time 2
Time 3
Etc.

I am trying to change this dataframe to be 3d, where each value in this dataframe has another dimension into the screen, containing the same value for the same feature, but at previous 192 timesteps.

Here i am trying to use the built in function keras.preprocessing.timeseries_dataset_from_array(), but it returns the opposite of what i’m trying to achieve.

I expect it to return

          Feature 1 | Feature 2| Feature 3 | Feature 4 | etc.
Time 192| [1-192]   | [1-192]  | [1-192]   |           |
Time 193|           |          |           |           |
Time 194|           |          |           |           |
Time End|           |          |           |           |

Here it instead returns:

        Feature 1 | Feature 2| Feature 3 | Feature 4 | etc.
Time 1| [192-1]   | [192-1]  | [192-1]   |           |
Time 2|           |          |           |           |
Time 3|           |          |           |           |
Time End-192|     |          |           |           |

Basically every sample contains the future 192 values, instead of the previous 192 values of the dataset. Therefore it ends 192 samples before it should, and starts 192 samples too early.

My code is the following:

#Past is defined as 192
#x_val is the 2-d dataframe
#y_val is one of the columns in the dataframe.

dataset_historic_train = keras.preprocessing.timeseries_dataset_from_array(
    x_val,
    y_val,
    sequence_length=past,   
    batch_size=len(x_val),
)

Where x_val is the entirety of my 2-d dataframe indexed from first to last time of sample, and y_val is my target feature, which is Feature 1 in this case.

python dat

Ekaterina_Dranitsyna · November 5, 2021, 8:25am

You pass x and y of equal length to the dataset constructor. When it transforms x using sliding window of size 192, the x becomes shorter, because the first 192 rows of your original DataFrame do not have enough previous values. So it drops the last 192 values of y to pair it with x.
To make it work as expected you should pass x and y[192:]. Then it will drop the first 192 values of y.

Topic		Replies	Views
Misleading examples in the tf.keras.utils.timeseries_dataset_from_array General Discussion api , keras , help_request	2	1171	September 20, 2023
Using TF timeseries_dataset_from_array with more samples General Discussion tfdata , help_request	2	1921	January 16, 2023
Inquiry about the TimeSeries WindowGenerator Class General Discussion timeseries , tfdata	4	85	June 1, 2024
How remove the label data from feature data from window time series (tensorflow) General Discussion models	2	1195	February 7, 2023
Use `tf.keras.utils.timeseries_dataset_from_array` for a given batch inside of a generator General Discussion api , datasets , keras	1	609	November 29, 2024

Timeseries_dataset_from_array returns future samples instead of previous

Related topics