I’m doing text classification on the built-in IMBD dataset. A network with a dense layer and averaging over the temporal dimension with layers.GlobalAveragePooling1D
does fairly well, but if I switch out the averaging with an LSTM layer (which has the same input and output dimensions, so the rest of the network can remain unchanged) the accuracy remains very close to 50%! I’m really confused as to why this could be. Below is code that anyone should be able to run. If you have any idea as to why this is happening please share!
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, losses
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
import tensorflow_datasets as tfds
train_data, test_data = tfds.load(name="imdb_reviews", split=('train', 'test'), batch_size=-1, as_supervised=True)
xTrain, yTrain = tfds.as_numpy(train_data)
xTest, yTest = tfds.as_numpy(test_data)
max_features = 5000
max_len = 500
vectorize_layer = TextVectorization(max_tokens=max_features, output_mode='int', output_sequence_length=max_len)
vectorize_layer.adapt(xTrain)
xTrainV = vectorize_layer(xTrain)
xTestV = vectorize_layer(xTest)
model = tf.keras.Sequential([
layers.Embedding(max_features+1, 16),
layers.GlobalAveragePooling1D(),
#layers.LSTM(8),
layers.Dense(1)])
model.compile(loss=losses.BinaryCrossentropy(from_logits=True),
optimizer='adam',
metrics=tf.metrics.BinaryAccuracy(threshold=0))
model.fit(xTrainV,yTrain,validation_data=(xTestV, yTest), epochs=10)
The LSTM
and GlobalAveragePooling1D
layers can be switched between freely. An example progression of validation accuracies with GlobalAveragePooling1D
over 10 epochs is:
0.65, 0.71, 0.67, 0.76, 0.77, 0.79, 0.8, 0.8, 0.81, 0.83
.
If the LSTM layer is used instead, both the validation accuracies as well as the training accuracies are all either 0.49, 0.50, or 0.51, after every epoch. There is no improvement and the network appears to predicting randomly. Why?