From the documentation on tf.keras.layers.Embedding:
input_dim
:
Integer. Size of the vocabulary, i.e. maximum integer index + 1.
mask_zero
:
Boolean, whether or not the input value 0 is a special “padding” value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True, then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal size of vocabulary + 1).
-
If my vocabulary size is
n
but they are encoded with index values from 1 ton
(0 is left for padding), isinput_dim
equal ton
orn+1
? Themaximum integer index + 1
part of the documentation is confusing me. -
If the inputs are padded with zeroes, what are the consequences of leaving
mask_zero = False
? -
If
mask_zero = True
, based on the documentation, I would have to increment the answer from my first question by one? What is the expected behaviour if this was not done?