How to define BatchNorm3d in tensorflow

PyTorch has three types of BatchNorm layers: BatchNorm1d, BatchNorm2d and BatchNorm3d. TensorFlow has only one BatchNormalization layer.

Are these layers equivalent? I mean, can the three PyTorch layers be defined using that layer from TensorFlow? Are there any parameters of TensorFlow layer that need to be adjusted to have it achieve the behavior of PyTorch layers?

Hi @Nada-Nada, In pytorch BatchNorm1d applies batch normalization over a 2D or 3D input, BatchNorm2d applies batch normalization over a 4D input and BatchNorm3d applies batch normalization over a 5D input, but in tensorflow you can use the batchnormalization for the different dimensional input. Thank You.

Many thanks @Kiran_Sai_Ramineni
One question please, do I need to set the axis parameter of tensorflow depending on the dimensionality of the input data, or I can still set it to -1 (the default value) for the different input dimensionalities? (I am just trying to replicate pytorch behavior)

Hi @Nada-Nada, Yes, the value to the axis parameter will depend upon which dimension the normalization should be applied based upon the input. Thank You.

Thank you @Kiran_Sai_Ramineni
Can you please explain how to set this parameter to replicate the behavior of:
1- BatchNorm1d layer of PyTorch
2- BatchNorm2d layer of PyTorch
3- BatchNorm3d layer of PyTorch

You help is very much appreciated

Hi @Nada-Nada, Please refer to this gist for implementation of BatchNormalization on data having different dimensions using torch and Tensorflow. Thank You.

Many thanks @Kiran_Sai_Ramineni for the gist. I can see that the output of TensorFlow BatchNormalization (axis=-1) is different than the one from PyTorch BatchNormXd. Do you have an idea please how to set the axis parameter so that they can have the same normalization output?

Again, thank you very much for your help.

Hi @Nada-Nada, If you see the momentum(Momentum for the moving average) argument used in BatchNormalization the default value to this argument in pytorch was 0.1 and in Tensorflow it was set to 0.99.

Also the calculation used for this is different in pytorch, Tensorflow during inference and training

Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt , where x^ is the estimated statistic and xt is the new observed value.

In Tensorflow it was moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum)

That might be the reason for getting different results. please refer to this similar issue to know more. Thank You.