Breaking the computational graph and running train_step multiple times

Hi,

I’m still trying to understand why my FasterRCNN implementation is converging incorrectly relative to a nearly-equivalent PyTorch implementation and I’ve narrowed in on two possible culprits:

  1. A broken computational graph and the use of train_step twice in my implementation.
  2. My RoI pool implementation (I’ve never been able to find working sample code for this and have had to implement it myself).

I build two models, which share many layers (e.g., the VGG-16 input layers). One model produces a series of bounding boxes and scores, which are then fed in to the second model to produce final bounding boxes and scores. The second model also takes the output of the shared layers of the first. The topology looks kind of like this:

               Shared Layers
                     |
        +------------+--------------------+
        |                                 |
     rpn_model Layers                     |
        |                                 |
      rpn_model outputs                   |
        |                                 |
        |                                 |
        +--------------->        classifier_model Layers
                                          |
                                  classifier_model outputs

I train by invoking train_step on each model, which should update the shared layers twice:

rpn_predictions = rpn_model.predict_on_batch(x = image)
rpn_losses = rpn_model.train_on_batch(x = image, y = y_true)
... code to generate boxes from rpn_predictions ...
classifier_losses = classifier_model.train_on_batch(x = [ image, boxes ], y = [ y_true_classifier ])

Is this doing what I assume it is doing? It should be possible to backprop from the two different model outputs, updating all the layers. Am I missing some step or is Keras potentially doing something non-obvious here?

Thanks,

Bart

Hello @Bart
Thank you for using TensorFlow

In this training pipeline, we have to take care of gradient flow in the training loop, if we are not careful the computations on layers might not get update correctly, instead of using train_on_batch, consider using custom training loop which directly calculates the gradients and losses from the both outputs and update on layers respectively.