Hi all. Maybe a newbie question here, but I’ve not had much experience with sequential models and I’ve not been able to find an example or clear answers to this question online.
All tutorials and resources I have found discuss building variants of RNN’s using only a single timeseries as input. But, what if I have multiple (overlapping) timeseries? What is the correct way of going about this? For example, house price data resampled to monthly periods to predict the next month’s price (y) using multivariate data (X).
Is this the correct approach?:
- For every house, split data for that house into training, validation and test subsets (with index as timestamp and index being sorted).
- (SKLearn Transformation step) Fit the concatenated training data. Then, for every house, transform their X matrix.
- For every house, for every subset of data (train, val, test), concatenate X and y horizontally (i.e., ‘Xy’) and split data into sequential data using a sliding window.
This will then give me an array of sequences, where each index in the array contains sequence data for a unique house. Each sequence is of the same length.
At this point, is it as simple as calling the .fit() method for every house (updating the weights)? How does the model know that I am feeding in a new sequence, and therefore the time-series ‘starting date’ has reset back to timestamp 0 ? An epoch can be defined as training on the entire dataset but in this case, the entire dataset is only one house’s sequence so each fit() will only train the RNN on a single house for that many epoch, as opposed to training using all the house’s sequential datasets. Would I have to loop over the array of sequences and .fit() each sequence with an epoch of 1 and define my own epoch loop? E.g,
model = myRNNmodel()
5 epochs
for epoch in range(5):
# for every house sequence
for s in house_sequences:
# update model weights for that house, do not loop back over
model.fit(s['train'], validation_data=s['val'], epochs=1)
# model.reset_weights() # ? reset to 'beginning' of timestamps?
Hopefully this makes sense in essence, I want to understand how to feed in this data to any variant of RNN:
RNN input (train) = [
house 1 : [ sequence-t0, sequence-t1, …, sequence-t330],
house 2 : [ sequence-t0, sequence-t1, …, sequence-t330],
…,
house 6000 : [ sequence-t0, sequence-t1, …, sequence-t330]
]