Given the same model, I found that the calculated flops in pytorch and tensorflow are different. I used the keras_flops (keras-flops · PyPI) in tensorflow, and ptflops (ptflops · PyPI) in pytorch to calculate flops. It seems that flops in pytorch are closed to my own calculation by hands. Is that tensorflow has some tricks to speed up the computation so that few flops are measured? My model in tensorflow
d=56
s=12
inp = Input((750 ,750, 1))
x = Conv2D(d, (5,5), padding='same')(inp)
x = PReLU()(x)
x = Conv2D(s, (1,1), padding='valid')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(d, (1,1), padding='same')(x)
x = PReLU()(x)
out = Conv2DTranspose(1 ,(9,9), strides=(4, 4),padding='same',output_padding = 3)(x)
My model in tensorflow
node name | # float_ops
Conv2D 8.92b float_ops (100.00%, 61.95%)
Conv2DBackpropInput 5.10b float_ops (38.05%, 35.44%)
Neg 180.00m float_ops (2.61%, 1.25%)
BiasAdd 105.75m float_ops (1.36%, 0.73%)
Mul 90.00m float_ops (0.63%, 0.63%)
======================End of Report==========================
The FLOPs is:14.3 GFlops
However, the FLops in pytorch is
Model_1(
0.013 M, 100.000% Params, 45.486 GMac, 100.000% MACs,
(begin): Sequential(
0.002 M, 11.804% Params, 0.851 GMac, 1.870% MACs,
(0): Conv2d(0.001 M, 11.367% Params, 0.819 GMac, 1.801% MACs, 1, 56, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(1): PReLU(0.0 M, 0.437% Params, 0.032 GMac, 0.069% MACs, num_parameters=56)
)
(middle): Sequential(
0.007 M, 52.775% Params, 3.803 GMac, 8.360% MACs,
(0): Conv2d(0.001 M, 5.340% Params, 0.385 GMac, 0.846% MACs, 56, 12, kernel_size=(1, 1), stride=(1, 1))
(1): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
(2): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
(4): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
(6): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
(8): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
(10): Conv2d(0.001 M, 5.684% Params, 0.409 GMac, 0.900% MACs, 12, 56, kernel_size=(1, 1), stride=(1, 1))
(11): PReLU(0.0 M, 0.437% Params, 0.032 GMac, 0.069% MACs, num_parameters=56)
)
(final): ConvTranspose2d(0.005 M, 35.420% Params, 40.833 GMac, 89.770% MACs, 56, 1, kernel_size=(9, 9), stride=(4, 4), padding=(4, 4), output_padding=(3, 3))
)
Computational complexity: 45.49 GMac