Embedding dim in multiclass text classification

Francesca_Pisani · October 9, 2023, 2:29pm

Hi, I am working on a project where I am trying to predict a pathology from patient anamnesis. The vocab size is 55000, and there are 3 classes.
I use an embedding layer before an LSTM/GRU layer.
What is the best embedding dimension for this case?

tagoma · October 9, 2023, 6:53pm

Hi @Francesca_Pisani .

The documentation of keras.layers.Embedding is here.
In your case, the embedding layer will probably look like this:

tf.keras.layers.Embedding(input_dim=50000, # Number of words in vocabulary
output_dim=EMBEDDING_DIMS, # Dimension of the dense embedding
input_length=MAX_LEN) # Length of the largest input sequences

Note the number of classes (3) is something you’ll take into account in another layer down the road.

Topic		Replies	Views
Question about input_dim and mask_zero in embedding layers General Discussion api , keras , education , help_request	1	872	December 29, 2023
How to encode multiple categorical features in tensorflow2 efficiently? (The embedding layer costs too much storage. ) General Discussion keras	1	286	December 4, 2024
Multi text classification General Discussion datasets , classification	4	155	June 12, 2024
How to create embeddings for text data in tensorflow and how to pass it to the neural network model Recommenders keras , help_request	3	1505	April 27, 2022
LSTM model for sentiment analysis General Discussion models , nlp , keras , help_request	14	2167	September 27, 2021

Embedding dim in multiclass text classification

Related topics