Pytorch code convertion into keras

I’m traying to convert Pytorch code into Tensorflow. What is the equivalent of self.model_t.layer1[-1].register_forward_hook(hook_t) in Tensorflow/Keras?

    def hook_t(module, input, output):
        self.features_t.append(output)
    def hook_s(module, input, output):
        self.features_s.append(output)

    self.model_t = resnet18(pretrained=True).eval()
    for param in self.model_t.parameters():
        param.requires_grad = False

    self.model_t.layer1[-1].register_forward_hook(hook_t)
    self.model_t.layer2[-1].register_forward_hook(hook_t)
    self.model_t.layer3[-1].register_forward_hook(hook_t)

    self.model_s = resnet18(pretrained=False) # default: False
    self.model_s.layer1[-1].register_forward_hook(hook_s)
    self.model_s.layer2[-1].register_forward_hook(hook_s)
    self.model_s.layer3[-1].register_forward_hook(hook_s)

Thanks!

Have you checked:
https://github.com/tensorflow/tensorflow/issues/33478

Thanks,

Currently immediatamente trying to implement this paper [2103.04257] Student-Teacher Feature Pyramid Matching for Anomaly Detection, Pytorch implementation in pretty straightforward, but i have some issue with tensorflow. I defined the model in the following way, but I don’t think it is correct, the result it’s quite different from Pytorch, also in terms of trainable parameters (~11M Pytorch vs ~3M TF)

def Define_Model(img_shape, num_channel):

#----------------------------Istanza ResNet-18 ----------------------------
ResNet18, preprocess_input = Classifiers.get('resnet18')                                                                                             
#--------------------------------------------------------------------------
    

#----------------------- Definizione Tensore Input ------------------------
input_tensor = tf.keras.Input(shape = (img_shape, img_shape, num_channel))
#--------------------------------------------------------------------------

    
#----------------- -- Definizione ResNet Teacher e Student ----------------
t_net = ResNet18(weights = 'imagenet', include_top = False, input_tensor = input_tensor, input_shape = (img_shape, img_shape, num_channel))
s_net = ResNet18(weights = None, include_top = False, input_tensor = input_tensor, input_shape = (img_shape, img_shape, num_channel))
#--------------------------------------------------------------------------


#---------------------- Redifinzione Nomi Layer Reti ----------------------
for i, layer in enumerate(t_net.layers):
    layer._name = 't_net_' + layer.name
            
for i, layer in enumerate(s_net.layers):
    layer._name = 's_net_' + layer.name
#--------------------------------------------------------------------------   


#------------------ Imposto la rete Teacher come non addestrabile ---------
for l in t_net.layers:
    l.trainable = False
#--------------------------------------------------------------------------

    
#----------------- Estrazione Layer Intermedi Teacher ---------------------
intermediate_t_layer_1 = t_net.get_layer("t_net_stage1_unit2_conv2").output        
intermediate_t_layer_2 = t_net.get_layer("t_net_stage2_unit2_conv2").output        
intermediate_t_layer_3 = t_net.get_layer("t_net_stage3_unit2_conv2").output
#--------------------------------------------------------------------------

   
#----------------- Estrazione Layer Intermedi Student ---------------------  
intermediate_s_layer_1 = s_net.get_layer("s_net_stage1_unit2_conv2").output        
intermediate_s_layer_2 = s_net.get_layer("s_net_stage2_unit2_conv2").output        
intermediate_s_layer_3 = s_net.get_layer("s_net_stage3_unit2_conv2").output
#---------------------------------------------------------------------------


#------------------------------ Output -----------------------------------
out_1 = [intermediate_t_layer_1] + [intermediate_t_layer_2] + [intermediate_t_layer_3]
out_2 = [intermediate_s_layer_1] + [intermediate_s_layer_2] + [intermediate_s_layer_3]
#--------------------------------------------------------------------------


#------------------------------ Modello -----------------------------------
model = tf.keras.Model(inputs = input_tensor, outputs = [out_1, out_2])
#--------------------------------------------------------------------------

    
#------------------------------ Compile -----------------------------------   
model.add_loss(Feature_Loss(input_tensor, out_1, out_2))     
model.compile(Adam(lr = 0.4), loss = None)
#--------------------------------------------------------------------------

return model, t_net, s_net

Daniele

Have you tried to compare the two models with a model summary, a graph or any other visualization tool?

I compared the models with a summary. Total parameters of two models (TF and Pytorch) are substantially equal, it is the trainable parameters that are very different. It seems that TF model is truncked after third residual block. Model TF definition is correct? The loss is a measure of the distance between the teacher and student features. Here the Pytorch implentation GitHub - hcw-00/STPM_anomaly_detection: Unofficial pytorch implementation of Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection

Can you post the Netron graph of the two Networks?

Uhmm It seems that is not possible upload images in a message

Yes as you are new in the forum need to scale Discuss gamification to enable more permissions.

Do you have a link?

Ok, I never used Netron tool, I start sharing Keras graph and Pytorch summary

It is hard to follow the connections in a pytorch summary but in the Keras graph I don’t see the intermediate connections between the student and the teacher.

I am not sure in pytorch what kind of tool you could use to visualize the graph connection:

Ok, i’ll try to visualize pytorch graph, I don’t use usually Pytorch

Thanks for your support

Should the blocks of the two network be connected?

I don’t have the Pytorch graph but quickly checking the mentioned Pytorch impl It seems not. The output features vector form teacher and student model are just used to compute the loss with a for loop

Exactly what I also understood from the paper. The teacher net isn’t trainable and it provides only the feature as reference for the student net. It seems a quite simple model but I can not reproduce the results with TF.

I’ve not checked the paper details.

Can you try to adapt this tutorial to your specific use case?

I view the tutorial but it is focus on logits distillation that is simpler

As you have in that example a custom train loop/step I think that you could customize your loss a you want there.

I’ll try. I have a question: if my output is an intermediate layer, Are the network trainable parameters only those untill the intermediate layer or all network parameters?

Thanks

I think that in your case you have multiple outputs as all the intermediate outputs are accumulated in the loss.

Check also: