I have seen some code uses the combination of log_softmax in the Dense layer with from_logits=True in the cross-entropy loss function in order to have a stable computation of the softmax. How does it compare with using linear activation in the Dense layer with from_logits=True in the cross-entropy loss function? Isn’t there duplicate “softmax” in the first case of using log_softmax in the Dense layer since the cross-entropy loss function will perform the softmax calculation if from_logits=True?
Hi @khteh, Yes, You are correct, You should definitely use Linear output + from_logits=True in your loss. Using log_softmax as the activation is just a type of redundancy and misuses the API designed for stable calculation. Thank you!
Helo how r u
I thing that’s soft max we can help in many issues
But I what’s to ask about Wap and how I can transfer my websites in application