ValueError: `logits` and `labels` must have the same shape, received ((None, 1) vs (None, 200))

I tried to train a convolutional neural network to predict the labels (categorical data) given the criteria (text). This should have been a simple classification problem. There are 7 labels, hence my network has 7 output neurons with sigmoid activation functions.

I encoded training data using the following simple format, in a txt file, using text descriptors ('criteria') and categorical label variables ('label'):

'criteria'|'label'

Here’s a peak at one entry from data file:

Headache location: Bilateral (intracranial). Facial pain: Nil. Pain quality: Pulsating. Thunderclap onset: Nil. Pain duration: 11. Pain episodes per month: 26. Chronic pain: No. Remission between episodes: Yes. Remission duration: 25. Pain intensity: Moderate (4-7). Aggravating/triggering factors: Innocuous facial stimuli, Bathing and/or showering, Chocolate, Exertion, Cold stimulus, Emotion, Valsalva maneuvers. Relieving factors: Nil. Headaches worse in the mornings and/or night: Nil. Associated symptoms: Nausea and/or vomiting. Reversible symptoms: Nil. Examination findings: Nil. Aura present: Yes. Reversible aura: Motor, Sensory, Brainstem, Visual. Duration of auras: 47. Aura in relation to headache: Aura proceeds headache. History of CNS disorders: Multiple Sclerosis, Angle-closure glaucoma. Past history: Nil. Temporal association: No. Disease worsening headache: Nil. Improved cause: Nil. Pain ipsilateral: Nil. Medication overuse: Nil. Establish drug overuse: Nil. Investigations: Nil.|Migraine with aura

Here’s a snippet of the code from the training algorithm:

'''A. IMPORT DATA'''
dataset = pd.read_csv('Data/ICHD3_Database.txt', names=['criteria', 'label'], sep='|')
features = dataset['criteria'].values 
labels = dataset['label'].values

'''B. DATA PRE-PROCESSING: BAG OF WORDS (BOW) MODEL'''
def BOW_Model(features):
    features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
    vectorizer = CountVectorizer() 
    features_train = vectorizer.fit_transform(features_train) 
    features_test = vectorizer.transform(features_test) 
    return features_train, features_test, labels_train, labels_test

'''B. DATA PRE-PROCESSING: WORD EMBEDDINGS'''
def word_embeddings(features):
    maxlen = 200
    features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
    tokenizer = Tokenizer(num_words=5000)
    tokenizer.fit_on_texts(features_train)
    features_train = pad_sequences(tokenizer.texts_to_sequences(features_train), padding='post', maxlen=maxlen)
    features_test = pad_sequences(tokenizer.texts_to_sequences(features_test), padding='post', maxlen=maxlen) 
    vocab_size = len(tokenizer.word_index) + 1  # Adding 1 because of reserved 0 index
    tokenizer.fit_on_texts(labels_train)
    labels_train = pad_sequences(tokenizer.texts_to_sequences(labels_train), padding='post', maxlen=maxlen)
    labels_test = pad_sequences(tokenizer.texts_to_sequences(labels_test), padding='post', maxlen=maxlen)
    vocab_size += len(tokenizer.word_index) + 1  # Adding 1 because of reserved 0 index
    return features_train, features_test, labels_train, labels_test, vocab_size, maxlen

features_train, features_test, labels_train, labels_test, vocab_size, maxlen = word_embeddings(features) # Pre-process text using word embeddings

'''C. CREATE THE MODEL'''
def design_model(features, hidden_layers=2, number_neurons=128):
    model = Sequential(name = "My_Sequential_Model") 
    model.add(layers.Embedding(input_dim=vocab_size, output_dim=50, input_length=maxlen)) 
    model.add(layers.Conv1D(128, 5, activation='relu'))
    model.add(layers.GlobalMaxPool1D()) 
    for i in range(hidden_layers): 
        model.add(Dense(number_neurons, activation='relu')) 
        model.add(Dropout(0.2)) 
    model.add(Dense(7, activation='sigmoid')) 
    opt = Adam(learning_rate=0.01) 
    model.compile(loss='binary_crossentropy', metrics=['mae'], optimizer=opt)
    return model

'''E. TRAIN THE MODEL'''
model = design_model(features_train, hidden_layers=2, number_neurons=30) 
history = model.fit(features_train, labels_train, epochs=10, batch_size=16, verbose=0, validation_split=0.33, callbacks=[EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)]) 

But when I run the model, I get the following error:

Traceback (most recent call last):
  File "c:\Users\user\Desktop\Deep Learning\deep_learning_headache.py", line 112, in <module>
    history = model.fit(features_train, labels_train, epochs=10, batch_size=16, verbose=0, validation_split=0.33, callbacks=[EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)]) # 18. Fit model using optimized epochs & batch size. When the training performance reaches the plateau or starts degrading, the learning stops.
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\user\AppData\Local\Temp\__autograph_generated_file6x9w264i.py", line 15, in tf__train_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:

    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\training.py", line 1401, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\training.py", line 1384, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\training.py", line 1373, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\training.py", line 1151, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\training.py", line 1209, in compute_loss
        return self.compiled_loss(
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\engine\compile_utils.py", line 277, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\losses.py", line 143, in __call__
        losses = call_fn(y_true, y_pred)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\losses.py", line 270, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\losses.py", line 2532, in binary_crossentropy
        backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
    File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\backend.py", line 5822, in binary_crossentropy
        return tf.nn.sigmoid_cross_entropy_with_logits(

    ValueError: `logits` and `labels` must have the same shape, received ((None, 1) vs (None, 200)).

Where am I going wrong?

Print and check if the labels that are going in are actually of the desired size of length 7. You can have an issue there.
Always use the ‘softmax’ activation in the last layer: it was build for cross-entropy and it’ll give you much more reliable results than using sigmoid, and is much more efficient computationally.

Hi @The_Machine_Preacher ,

welcome to the forum :tada:.

Please keep an eye on the selected loss. Since BinaryCrossentropy is just for (0 or 1) classification.
Can you try tf.keras.losses.CategoricalCrossentropy(from_logits=False) or from_logits=True

Please feel free to share the shape of your labels (dificult to read from code) …
print(labels_train.shape)

Looking forward,
Dennis

The error you’re encountering (ValueError: logits and labels must have the same shape, received ((None, 1) vs (None, 200)) ) suggests a mismatch between the shape of the predictions your model is generating and the shape of your target labels

Given that you’re working on a classification problem with 7 labels, there are a few potential issues to address:

  1. Label Encoding: For a multi-class classification problem with 7 categories, you should ensure your labels are one-hot encoded, resulting in a label array shape of [num_samples, 7], where num_samples is the number of examples in your dataset. It seems there might be a confusion in how you’re handling labels, especially since you’re using padding on them, which is unusual for categorical labels.
  2. Final Layer Activation: For multi-class classification, it’s standard to use a softmax activation function in the final layer, not sigmoid. Softmax will ensure that the output probabilities sum up to 1, making it suitable for multi-class classification. Change the activation function of your final Dense layer to softmax:

pythonCopy code

model.add(Dense(7, activation='softmax'))
  1. Loss Function: When using softmax activation in the final layer for multi-class classification, the appropriate loss function is categorical_crossentropy, not binary_crossentropy. Update the loss function in your model compilation step:

pythonCopy code

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer=opt)
  1. Review Data Preprocessing: Ensure that your label preprocessing correctly one-hot encodes the labels into a 2D array of shape [num_samples, 7]. The use of pad_sequences on labels is peculiar and might not be appropriate unless you’re dealing with a sequence prediction problem, which doesn’t seem to be the case here.
  2. Model Output: Ensure the model’s output layer has the correct number of units (7 for your case) and matches the shape of your one-hot encoded labels. The error message indicates a mismatch, possibly due to incorrect handling of label preprocessing.

Correct these aspects, and your model should be able to train without encountering the shape mismatch error. Here’s a revised snippet for your label encoding and model compilation:

pythonCopy code

from tensorflow.keras.utils import to_categorical

# Assuming 'labels' is an array of integer class labels
labels = to_categorical(labels, num_classes=7)

# Update your model compilation
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

Ensure your labels are correctly one-hot encoded and your model’s final layer and loss function are appropriately set up for a multi-class classification task.

" model.add(Dense(7, activation=‘sigmoid’)) "

" ValueError: logits and labels must have the same shape, received ((None, 3) vs (None, 1)). "
this error occured to me…where my last dense layer = Dense(3,activation = ‘sigmoid’). Since I need 3 outputs. Since my model belongs to BinaryClassification, it was genrating 1 output

so i change my last dense layer to = Dense(1,activation = ‘sigmoid’).
which solved my error
hope it can be related

I have a similar error:

ValueError: 'logits' and 'labels' must have the same shape, received ((None, 2) vs (None, 1)).

My dataframe has two columns: Links (to string type, has titles of new articles) and Shortlisted (of numeric type, has value either 0 (not shortlisted) or 1 (shortlisted)).

I am working on a binary classification (nlp) project using BERT (by Google)
My code is throwing this error:
ValueError: logits and labels must have the same shape, received ((None, 2) vs (None, 1)).

And I am not understanding what this means and how to fix this.

  1. Splitting into training and testing data.
df['Length'].max()train_df=df[~(df['Sheet'].str.contains('April2024', regex=True) | df['Sheet'].str.contains('May2024', regex=True))]
test_df=df[len(train_df):]
  1. Importing libraries and classes
from transformers import BertTokenizer, create_optimizer, TFBertForSequenceClassification
from sklearn.model_selection import train_test_split
import tensorflow as tf
  1. Split the data into training, validation, and test sets
train_texts, train_labels = train_df['Links'].tolist(), train_df['Shortlisted'].tolist()
val_texts, test_texts, val_labels, test_labels = train_test_split(
    test_df['Links'].tolist(), test_df['Shortlisted'].tolist(), test_size=0.5, random_state=42)
  1. BERT Tokenizer (the max_length is 163 as the longest title that i have in ‘Links’ column is of length 163)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

def tokenize_function(texts):
    return tokenizer(texts, padding='max_length', truncation=True, max_length=163, return_tensors="tf")

train_encodings = tokenize_function(train_texts)
val_encodings = tokenize_function(val_texts)
test_encodings = tokenize_function(test_texts)
  1. Creating a Tensorflow dataset
train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    train_labels
)).shuffle(len(train_texts)).batch(8)

val_dataset = tf.data.Dataset.from_tensor_slices((
    dict(val_encodings),
    val_labels
)).batch(16)

test_dataset = tf.data.Dataset.from_tensor_slices((
    dict(test_encodings),
    test_labels
)).batch(16)
  1. Fetching pre-trained model
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=2)
  1. The error is here!!! (when training and validating the model, i have set epochs=1 as i have around 9700 records only to train the model)
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=tf.keras.metrics.CategoricalAccuracy())

history = model.fit(train_dataset, epochs=1, validation_data=val_dataset)
  1. Testing model
results = model.evaluate(test_dataset)

print(results)
  1. Saving model
model.save_pretrained('./fine-tuned-bert')

tokenizer.save_pretrained('./fine-tuned-bert')
  1. Testing model on a new article
model = TFBertForSequenceClassification.from_pretrained('./fine-tuned-bert')
tokenizer = BertTokenizer.from_pretrained('./fine-tuned-bert')

def predict(text):
    inputs = tokenizer(text, return_tensors='tf', padding=True, truncation=True, max_length=128)
    outputs = model(**inputs)
    predictions = tf.argmax(outputs.logits, axis=-1)
    return predictions

new_text = "Green energy to drive power sector investment, coal to remain significant: Moody's"
predicted_label = predict(new_text)
print(predicted_label)

Also, this is my first time working on an NLP project. Any suggestions on improvement in this code will be highly appreciated!

P.S.: below snippet of code did not throw an error and gave an accuracy of above 95%. The test data also had 100% accuracy.

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(train_dataset, epochs=1, validation_data=val_dataset)