Using Sparse Categorical CrossEntropy, the loss becomes negative

joanmanel · July 17, 2024, 10:27am

I am doing the Tensorflow TF tutorial (مدل ترانسفورماتور برای درک زبان | Text | TensorFlow) but with my own data. My data is not related to text, but it is sequences of tokens anyway, with a start token, and an end token. All the tokens go from 0 to 30 (start token is 31, end is 32).

My code is very similar to the one from the tutorial, with very small changes:

I am using sparse cross entropy as in the tutorial.

I have a problem where the loss that is displayed as the training goes becomes negative. For example right now it says -2.0593.

I am using Sparse Cross Entropy. When I monitor with a callback the loss of some test batches, the value returned is never negative, but usually some value between 1.5 and 2.

As you can see my last layer in the Transformer is a Dense layer without activation function (as in the tutorial) and therefore in the loss I set “from_logit=True”. I have tried using a softmax activation in that last layer, and then setting “from_logit=False”, but this does not seem to train, and it gets stuck in around a loss of 0.274, and it never moves.

I have no idea why the loss becomes negative, when the loss function always seems to output positive numbers when I test it.

Overall the results of the training are also not good.

Kiran_Sai_Ramineni · July 18, 2024, 7:21am

Hi @joanmanel, The loss is just a scalar that you are trying to minimize. It’s not supposed to be positive. Thank You.

joanmanel · July 18, 2024, 1:50pm

Thanks @Kiran_Sai_Ramineni How come when I put from_logits=False, and I apply a softmax at the end layer, the loss doesnt change at atll?

Mah_Neh · July 21, 2024, 7:26am

Not an expert in LLMs but as it is described the problem is difficult to debug and impossible to give you more help.

What would be a somewhat useful test is: replacing the custom data with data used in the tutorial, does it train well?

We may need to design the right questions to narrow/track the problem down more precisely in my humble opinion.

Topic		Replies	Views
Issue with creating a custom loss function Keras models , keras_nlp	0	433	June 26, 2023
Train accuracy is decreasing while loss is decreasing General Discussion models , help_request , tf-probability	5	1707	February 15, 2022
Almost no training since using from_tensor_slices General Discussion help_request	2	289	October 3, 2023
Loss is decreasing very slowly when custom loss function is used General Discussion models , keras , help_request	1	1216	October 11, 2022
Basic text classification sample why need from_logits=True? TensorFlow models , keras , evaluation , accuracy	3	1213	June 13, 2023

Using Sparse Categorical CrossEntropy, the loss becomes negative

Related topics