How to build a multi-output model using a subclass?

You need to create a model that takes an image of a movie poster and returns an age rating (multi-class classification) and a list of genres (multi-label classification).

Model code:

class NNetwork(Model):
    def __init__(self, rating: int, genres: int):
        super(NNetwork, self).__init__()
        self.rescaling = Rescaling(1.0 / 255.0)

        self.convolution = [
            Conv2D(16, 3, padding='same', activation='relu'),
            Conv2D(32, 3, padding='same', activation='relu'),
            Conv2D(64, 3, padding='same', activation='relu')
        ]
        self.pooling = [
            MaxPooling2D(),
            MaxPooling2D(),
            MaxPooling2D()
        ]
        self.flatten = Flatten()

        self.dense = [
            Dense(128, activation='relu'),
            Dense(rating, activation='softmax'),
            Dense(genres, activation='softmax')
        ]

    def call(self, images: Tensor, training=None, **kwargs) -> tuple[Tensor, Tensor]:
        x = self.rescaling(images)

        for i in range(3):
            x = self.convolution[i](x)
            x = self.pooling[i](x)

        x = self.flatten(x)

        x = self.dense[0](x)

        rating = self.dense[1](x)
        genres = self.dense[2](x)

        return rating, genres

Loss function code:

def loss(true: Tensor,
         predict: Tensor) -> Tensor:
    
    categorical = CategoricalCrossentropy(
        reduction=None
    )

    loss = tf.reduce_mean(categorical(
        y_true=true,
        y_pred=predict
    ))

    return loss

Training code:

@tf.function
def train_step(images_batch, rating_batch, genres_batch):
    with tf.GradientTape() as tape:
        rating_predict, genres_predict = model(images_batch)
        
        rating_loss = loss(rating_batch, rating_predict)
        genres_loss = loss(genres_batch, genres_predict)
        
    gradients = tape.gradient([rating_loss, genres_loss], model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    rating_score.update_state(rating_batch, rating_predict)
    genres_score.update_state(genres_batch, genres_predict)
    
    return rating_loss, genres_loss

Main loop code:

for n in range(EPOCHS):
    total_loss = 0
    total_rating_loss = 0
    total_genres_loss = 0
    
    for inputs, outputs in train:
        images_batch = inputs['image']
        rating_batch = outputs['rating']
        genres_batch = outputs['genres']
        
        rating_loss, genres_loss = train_step(images_batch, rating_batch, genres_batch)

        total_loss += (rating_loss + genres_loss)
        total_rating_loss += rating_loss
        total_genres_loss += genres_loss
    
    print(f'EPOCHS: {n} - total_loss: {total_loss.numpy()}, total_rating_loss: {total_rating_loss.numpy()}, total_genres_loss: {total_genres_loss.numpy()}')

The learning process starts, but the loss function at each epoch is huge, 10 to the 15th power. This means that the learning algorithm is designed incorrectly.

I’m guessing what the problem might be:

  1. The loss function was incorrectly selected or formatted.
  2. Gradients are calculated and applied incorrectly.

What else could be the reason for such a large loss function?

Hi @gsimonx37. Is there any specific reason why you set reduction=None

categorical = CategoricalCrossentropy(
reduction=None
)

Hence, those bigs numbers, no?

Tensorflow documentation reads:

reduction : Type of reduction to apply to the loss. In almost all cases this should be "sum_over_batch_size".