Almost no training since using from_tensor_slices

Hi everyone,

I am a bit puzzled here: I’ve got a classic CNN that worked super fine when I injected my data the usual way. Due to technical choices, I am now formatting my data as from_tensor_slices objects.

The bad thing is, since so, my training has stabilised at 20% val_f1/acc/iou, and the loss doesn’t decrease at all. All was fine before using from_tensor_slices.

Here is a bit of my code:

Xtrain_transformed = scaler.fit_transform(Xtrain.reshape(-1, 1)).reshape(Xtrain.shape)
Xtest_transformed = scaler.transform(Xtest.reshape(-1, 1)).reshape(Xtest.shape)
XtrainShape, XtestShape = Xtrain.shape[0], Xtest.shape[0]
self.scaler = scaler
del Xtrain, Xtest

if self.numClass > 2:
    train_masks_cat = tf.one_hot(Ytrain, depth=self.numClass, dtype=tf.float32)
    Ytrain = tf.reshape(train_masks_cat, (tf.shape(Ytrain)[0], tf.shape(Ytrain)[1], tf.shape(Ytrain)[2], self.numClass))
    test_masks_cat = tf.one_hot(Ytest, depth=self.numClass, dtype=tf.float32)
    Ytest = tf.reshape(test_masks_cat, (tf.shape(Ytest)[0], tf.shape(Ytest)[1], tf.shape(Ytest)[2], self.numClass))
    del train_masks_cat, test_masks_cat

X_train_tensor= tf.convert_to_tensor(Xtrain_transformed, dtype=tf.float32)
X_test_tensor= tf.convert_to_tensor(Xtest_transformed, dtype=tf.float32)
y_train_tensor= tf.convert_to_tensor(Ytrain, dtype=tf.float32)
y_test_tensor= tf.convert_to_tensor(Ytest, dtype=tf.float32)

training_dataset = (tf.data.Dataset.from_tensor_slices((X_train_tensor, y_train_tensor))
                                  .shuffle(XtrainShape)
                                  .batch(self.batch_size)
                                  .prefetch(tf.data.AUTOTUNE))
   
test_dataset = (tf.data.Dataset.from_tensor_slices((X_test_tensor, y_test_tensor))
        .shuffle(XtestShape)
        .batch(self.batch_size)
        .prefetch(tf.data.AUTOTUNE))

options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF

training_dataset = training_dataset.with_options(options)
test_dataset = test_dataset.with_options(options)

bestSize = tf.data.experimental.cardinality(training_dataset).numpy() - (tf.data.experimental.cardinality(training_dataset).numpy() % self.batch_size)
train_dataset = training_dataset.take(bestSize)
bestSize = tf.data.experimental.cardinality(test_dataset).numpy() - (tf.data.experimental.cardinality(test_dataset).numpy() % self.batch_size)
testing_dataset = test_dataset.take(bestSize)

I’ve tested several variations of hyperparameters, but overall, I really don’t get why it worked fine before, and not anymore.

I add my tailored loss function, taken from various github, if that may help in troubleshooting :slight_smile:

 def weighted_categorical_crossentropy(self):
        def loss(y_true, y_pred):
            weights = tf.reduce_sum(self.util.getClassFrequency() * y_true, axis=-1)  # Calculate weights based on class weights and true labels
            loss = K.categorical_crossentropy(y_true, y_pred)  # Calculate categorical cross-entropy loss
            weighted_loss = loss * weights  # Apply weights to the loss
            return K.mean(weighted_loss)  # Return the mean loss
        return loss

 def weighted_binary_crossentropy(self): #taken from https://github.com/huanglau/Keras-Weighted-Binary-Cross-Entropy/blob/master/DynCrossEntropy.py
        def loss(y_true, y_pred):    
            # calculate the binary cross entropy
            bin_crossentropy = K.binary_crossentropy(y_true, y_pred)
            # apply the weights
            weights = y_true * self.util.getClassFrequency()[1] + (1. - y_true) * self.util.getClassFrequency()[0]
            weighted_bin_crossentropy = weights * bin_crossentropy 
            return K.mean(weighted_bin_crossentropy)
        return loss

Thanks !

Hi @FloFive.
Can you please provide with your full code in a Colab where tensors with right shape would bear random numbers? This may help to understand what is (not!) going on.
Thank you.

Unfortunately, I cannot provide any more code due to industrial property issues :frowning:

The data are images, and the masks are one hot encoded, displaying n channels, with n being the amount of class.

Would you see anything that I might have done wrong and that would lead to flat metric curves?