Why training model with (Batch, 90, 7) Is slower than (Batch, 90, 8) ?)

Using tensorflow
I am using the following model:

model = Sequential()
model.add(Masking(mask_value=0.0, input_shape=(90, 7)))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(70, return_sequences=True))
model.add(LSTM(70, return_sequences=False)) 
model.add(Dense(20, activation='relu'))
model.add(Dense(22, activation='softmax'))

When the first layer is:
model.add(Masking(mask_value=0.0, input_shape=(90, 7)))
the training time (for each epoch) is slower then when the first layer is:
model.add(Masking(mask_value=0.0, input_shape=(90, 8)))

It seems that when the input is larger the GPU handle it faster.

  1. why is it ?
  2. Is it better to use model.add(Masking(mask_value=0.0, input_shape=(90, 8))) and set the last dim of the input with zeros ?