Running LSTMs and Conv1Ds over 1D series but with 4 dims - Is there a better way?

jian01 · May 26, 2021, 5:19pm

Hi! I have worked building NLP models with tensorflow with/wout keras for some time. I faced an issue with keras and tensorflow while trying to do the following:

input = tf.keras.layers.Input(shape=(MAX_WORDS, MAX_CHARS))
char_embedded = tf.keras.layers.Embedding(...)(input)

So now char embedding has dimension (dynamic batch size, MAX_WORDS, MAX_CHARS, embedding_dim), and doing any of the following throws an error:

tf.keras.layers.LSTM(...)(char_embedded)
tf.keras.layers.Conv1D(...)(char_embedded)

This error is caused because the tensor is 4 dimensional and both Conv1D or RNNs expect a 3 dimensional tensor. This is used in NER tagging or NLP for unknown words when we then concatenate the resulting char digest per word as a part of the word embeddings.
I have a wordaround for this:

lstm = tf.keras.layers.LSTM(...)
digests = []
for text in tf.unstack(char_embedded):
    digests.append(lstm(text))
digests = tf.stack(digests)

So now i have in digests what i wanted with dimension (FIXED_BATCH_SIZE, MAX_WORDS, output_dim).
In order for unstack to work it needs to know the batch size while building the graph, so im forced to make it constant:

input = tf.keras.layers.Input(shape=(MAX_WORDS, MAX_CHARS), batch_size=FIXED_BATCH_SIZE)

This whole solution has several disadvantages:

It’s messy in code
The TF graph looks awful in tensorboard and is slower to build so i guess it’s also a mess there
Forces me to train and predict with batch_size divisible amount of data, which is annoying because i have to constantly padding and unpadding things
If i have to predict for 1 item i have to pay the time cost of predicting for FIXED_BATCH_SIZE padding it

The last item has a final workaround that is training with the fixed batch size graph and then with those layers create the graph fixed with a batch size of 1, but this adds complexity to the code.

I have been using this workaround for 2 years now, and it works, and had no problems with it others than the listed ones, buut, i always thought it has to be a better way and probably much much strightforward and simple. So, is there?

Thanks,
Gianmarco.

Renu_Patel · November 25, 2023, 2:52pm

Hi @jian01

Welcome to the TensorFlow Forum!

As you mentioned correctly, Conv1D accepts Input 3D tensor with shape: batch_shape + (steps, input_dim) and results output shape as 3D tensor as well as LSTM keras layers also accepts Input a 3D tensor.

But what I an see from the TensorFlow API doc, the Embedding layer accepts Input - 2D tensor with shape: (batch_size, input_length) and results output - shape as 3D tensor with shape: (batch_size, input_length, output_dim), which can be acceptable by Conv1D and LSTM layer.

However, you can use ConvLSTM1D layer for Conv1D and LSTM layer together which accepts the input shape 4D tensor for your case.

Please try again and let us know if the issue still persists. Thank you.

Topic		Replies	Views
Simple RNN for token prediction General Discussion models , keras	1	415	June 18, 2023
Problem with stateful LSTM and static batch size General Discussion help_request , lstm	3	310	September 12, 2024
Model construction using ELMo embeddings and Bi-LSTM issue for sentence level token classification problem General Discussion keras , tfhub	1	678	April 13, 2023
Embedding layer output mismatch that of encoder General Discussion keras_core , lstm , tensorflow-data	3	439	December 13, 2023
Difficulties adapting text generation example to regression problem General Discussion models	0	269	September 20, 2023

Running LSTMs and Conv1Ds over 1D series but with 4 dims - Is there a better way?

Related topics