Train_on_batch and train_step used in custom training loop giving different results

kilodalton · August 12, 2022, 4:00am

I have a custom model class which has a training method implemented using train_step.

class MyFancyModel(tfk.models.Model):

	  ... 

    def train_model(self, data,  message=None):
        @tf.function()
        def train_step(model_and_inputs):
            model, data = model_and_inputs
            return model.train_step((data,))

        history = {}
        from tqdm import trange
        for i in trange(steps):
            _history = train_step((self, data))

            for k,v in _history.items():
                v = float(v)
                history[k].append(v)

        return history

Recently, I realized I should probably use self.train_on_batch rather than using my own local function definition based on train_step. So, I rewrote this method as

    def train_model(self, data,  message=None):
        history = {}
        from tqdm import trange
        for i in trange(steps):
            _history = train_on_batch(data, return_dict=True)

            for k,v in _history.items():
                v = float(v)
                history[k].append(v)

        return history

I thought that would be that, but I noticed that the new version’s output is slightly worse. I’m scratching my head trying to figure out what might be the salient difference between these two implementations. I’d appreciate any relevant insights into the keras.models.Model innards.

Cheers!

aniruthraj · October 15, 2024, 5:57pm

Hi @kilodalton,

Sorry for not getting back to you sooner.

As we know that train_on_batch is a built in method where the train_step is a custom looping where both operates on single batch of data. The key difference would be train_on_step is doing multiple gradient updates based on epoch, where train_step is updating single gradient at a time, this might affect the result of above.Kindly refer this github code of train_on_batch and train_step for understanding their functionalities.

Hope this helps.Thank You.

Topic		Replies	Views
Custom train_step() in TensorFlow Keras Model not Printing Values Sequentially General Discussion tfdata , model	1	108	July 2, 2024
Potential Issue: difference in behavior between loss function and model.compiled_loss Keras models , keras	2	530	September 6, 2023
Is there any difference between training with model.fit and tf.GrandientTape? General Discussion models , keras , help_request	1	1070	December 23, 2023
Tensorflow - self.metrics does not contain compiled metrics in train_step() Keras tf-train , tfkeras	2	364	February 25, 2024
Breaking the computational graph and running train_step multiple times General Discussion models , help_request	1	364	September 26, 2024

Train_on_batch and train_step used in custom training loop giving different results

Related topics