Why training model with (Batch, 90, 7) Is slower than (Batch, 90, 8) ?)

Using tensorflow
I am using the following model:

model = Sequential()
model.add(Masking(mask_value=0.0, input_shape=(90, 7)))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(70, return_sequences=True))
model.add(LSTM(70, return_sequences=False)) 
model.add(Dense(20, activation='relu'))
model.add(Dense(22, activation='softmax'))

When the first layer is:
model.add(Masking(mask_value=0.0, input_shape=(90, 7)))
the training time (for each epoch) is slower then when the first layer is:
model.add(Masking(mask_value=0.0, input_shape=(90, 8)))

It seems that when the input is larger the GPU handle it faster.

  1. why is it ?
  2. Is it better to use model.add(Masking(mask_value=0.0, input_shape=(90, 8))) and set the last dim of the input with zeros ?

Hi @SAL,

Apologies for the late reply.

GPUs are more efficient and faster with larger input sizes due to their ability to perform many operations simultaneously through parallel processing.They also stabilises the gradients and improves convergence when we are using larger datasets.

Yes, you can use model.add(Masking(mask_value=0.0, input_shape=(90, 8))) with input zeros for efficiency, but ensure that it doesn’t impact your model training by validating loss.

Thank You.