Hello!
I have variable sequence lengths, so I need to pad and mask them to a fixed length of timesteps.
I was wondering how exactly the Bidirectional layer handles the masked timesteps when merging the outputs of the forward and backward LSTMs (the LSTM has return_sequences=True)?
For example, suppose an input sequence is [1.0, 2.0, 3.0]
, and I pad it to length-5 by using -1.0
to become [1.0, 2.0, 3.0, -1.0, -1.0]
. I use the Masking layer to mask the last two timesteps. I then feed the masked sequence to the Bidirectional(LSTM) like the following:
output = Bidirectional(LSTM(1, return_sequences=True))(masked_input)
Suppose the output of the forward LSTM is [0.1, 0.2, 0.3, 0.0, 0.0]
since the last two timesteps are masked, and the output of the backward LSTM is [0.0, 0.0, 0.4, 0.5, 0.6]
. If using concatenate mode, will the Bidirectional layer merge these two outputs like the following?
[[0.1, 0.0], [0.2, 0.0], [0.3, 0.4], [0.0, 0.5], [0.0, 0.6]]
Or will it merge them like the following?
[[0.1, 0.4], [0.2, 0.5], [0.3, 0.6], [0.0, 0.0] [0.0, 0.0]]
I hope it is the second case.
If it is the first case, the result would be different from directly using the input [1.0, 2.0, 3.0]
without padding and masking, and this is not what we want. Especially when the padding part is longer than the original sequence length (e.g. a length-3 sequence is padded to length-7), all the output values of LSTMs would be concatenated with 0.0.
I would appreciate your advice on how the Bidirectional layer does the merge in Keras when there are masked timesteps.
Thank you very much!