I am going through the Transformer code on tensorflow.org .
def create_masks(self, inp, tar):
# Encoder padding mask (Used in the 2nd attention block in the decoder too.)
padding_mask = create_padding_mask(inp)
# Used in the 1st attention block in the decoder.
# It is used to pad and mask future tokens in the input received by
# the decoder.
look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
dec_target_padding_mask = create_padding_mask(tar)
look_ahead_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)
return padding_mask, look_ahead_mask
Transformer class has a method called create_masks which creates padding and look ahead mask. I understand that padding mask for encoder should take input sequence(input to the encoder) for creating padding mask. However, what I do not understand is why should the input sequence to the encoder should be used for creating padding mask for second attention block of the decoder(first line of the code). I think the padding mask for decoder should be created using the target sequence(which is fed to the decoder).
Please help me understand why this is done.