There are several mistakes in the comments in image_captioning tutorial.
For example, in the definition of RNN_Decoder, there is a sentence:
shape == (batch_size, max_length, hidden_size)
above the follwing code:
x = self.fc1(output)
However, it should be
shape == (batch_size, 1, units).
The reasons are as follows.
The second component of shape should be 1 as well as the one of output.shape.
Because x corresponds to a single word in a sentence, max_length shoud not appear. In addition, according to
self.fc1 = tf.keras.layers.Dense(self.units),
units is correct rather than hidden_size.