Breaking the computational graph and running train_step multiple times

Bart · September 21, 2021, 7:49pm

Hi,

I’m still trying to understand why my FasterRCNN implementation is converging incorrectly relative to a nearly-equivalent PyTorch implementation and I’ve narrowed in on two possible culprits:

A broken computational graph and the use of train_step twice in my implementation.
My RoI pool implementation (I’ve never been able to find working sample code for this and have had to implement it myself).

I build two models, which share many layers (e.g., the VGG-16 input layers). One model produces a series of bounding boxes and scores, which are then fed in to the second model to produce final bounding boxes and scores. The second model also takes the output of the shared layers of the first. The topology looks kind of like this:

               Shared Layers
                     |
        +------------+--------------------+
        |                                 |
     rpn_model Layers                     |
        |                                 |
      rpn_model outputs                   |
        |                                 |
        |                                 |
        +--------------->        classifier_model Layers
                                          |
                                  classifier_model outputs

I train by invoking train_step on each model, which should update the shared layers twice:

rpn_predictions = rpn_model.predict_on_batch(x = image)
rpn_losses = rpn_model.train_on_batch(x = image, y = y_true)
... code to generate boxes from rpn_predictions ...
classifier_losses = classifier_model.train_on_batch(x = [ image, boxes ], y = [ y_true_classifier ])

Is this doing what I assume it is doing? It should be possible to backprop from the two different model outputs, updating all the layers. Am I missing some step or is Keras potentially doing something non-obvious here?

Thanks,

Bart

Jetti_Bharat · September 26, 2024, 8:11am

Hello @Bart
Thank you for using TensorFlow

In this training pipeline, we have to take care of gradient flow in the training loop, if we are not careful the computations on layers might not get update correctly, instead of using train_on_batch, consider using custom training loop which directly calculates the gradients and losses from the both outputs and update on layers respectively.

Topic		Replies	Views
Gradient normal clipping having an unusually strong effect despite no clear evidence of exploding gradients General Discussion model_garden , help_request	7	1001	September 16, 2021
Train_on_batch and train_step used in custom training loop giving different results Keras models , keras , help_request	1	1237	October 15, 2024
TF2/keras: Custom pooling layer does not evaluate correctly during training General Discussion models , keras , help_request , custom-layer	1	906	December 29, 2023
New, clean implementation of Faster R-CNN in both TensorFlow 2/Keras and PyTorch Show and Tell keras , learning , help_request	5	6681	July 8, 2022
ResNet Implementation General Discussion models , help_request	4	590	September 11, 2021

Breaking the computational graph and running train_step multiple times

Related topics