I am trying to build a hierarchical sequence model for time series classification (refer to the paper: hierarchical attention networks for document classification). But I’m very confused about how to mask the hierarchical sequences.
My data is a hierarchical time series. Specifically, each sample is composed of multiple sub-sequences and each sub-sequence is a multiple multivariate time series (just like word–> sentence -->document in NLP). So I need to pad and mask it twice. This is critical as a document will often not have the same number of sentences (or all sentences the same number of words). Finally, I get data as follows:
array([[[[0.21799476, 0.26063576],
[0.2170655 , 0.53772384],
[0.18505535, 0.30702454],
[0.22714901, 0.17020395],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]],
[[0.2160176 , 0.23789616],
[0.2675753 , 0.21807681],
[0.26932836, 0.21914595],
[0.26932836, 0.21914595],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]]],
[[[0.03941338, 0.3380829 ],
[0.04766269, 0.3031088 ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]],
[[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]]]], dtype=float32)
Then I build a hierarchical model as follows:
inputs = Input(shape=(maxlen_event, maxlen_seq, 2))
x = TimeDistributed(
Sequential([
Masking(),
LSTM(units=8, return_sequences=False)
])
)(inputs)
x = LSTM(units=32, return_sequences=False)(x)
x = Dense(16, activation='relu')(x)
output = Dense(16, activation='sigmoid')(x)
As my data is padded in on both dimensions, I don’t know how to mask it correctly. I have two questions about it:
Q1: In TimeDistributed, do I use the masking layer correctly to mask the first padding?
Q2: How to mask the second padding?
Thank you.