MobilenetV2 Implementation Differences

Hello!

I’m trying to replicate a Tensorflow 1 experiment (TF 1.9) in a Tensorflow 2.5 ecosystem and I decided to use the keras implementation provided in tf.keras.applications.mobilenet_v2.MobileNetV2.

The original experiment used the google research implementation of mobilenetv2

Original TF1 MovilenetV2 source code:
As far I see this implementation uses L2 regularization checking the training scope in
( google-research/resolve_ref_exp_elements_ml/deeplab/mobilenet/mobilenet.py at master · google-research/google-research · GitHub )

def training_scope(is_training=True,
weight_decay=0.00004,
stddev=0.09,
dropout_keep_prob=0.8,
bn_decay=0.997):
“”"Defines Mobilenet training scope.

Usage:
with tf.contrib.slim.arg_scope(mobilenet.training_scope()):
logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
# the network created will be trainble with dropout/batch norm
# initialized appropriately.
Args:
is_training: if set to False this will ensure that all customizations are
set to non-training mode. This might be helpful for code that is reused
across both training/evaluation, but most of the time training_scope with
value False is not needed. If this is set to None, the parameters is not
added to the batch_norm arg_scope.
weight_decay: The weight decay to use for regularizing the model.
stddev: Standard deviation for initialization, if negative uses xavier.
dropout_keep_prob: dropout keep probability (not set if equals to None).
bn_decay: decay for the batch norm moving averages (not set if equals to
None).

Returns:
An argument scope to use via arg_scope.
“”"
‘’‘’
Note: do not introduce parameters that would change the inference
model here (for example whether to use bias), modify conv_def instead.
‘’‘’
batch_norm_params = {‘decay’: bn_decay, ‘is_training’: is_training}
if stddev < 0:
weight_intitializer = slim.initializers.xavier_initializer()
else:
weight_intitializer = tf.truncated_normal_initializer(stddev=stddev)
‘’‘’
Set weight_decay for weights in Conv and FC layers.
‘’‘’
with slim.arg_scope(
[slim.conv2d, slim.fully_connected, slim.separable_conv2d],
weights_initializer=weight_intitializer,
normalizer_fn=slim.batch_norm),
slim.arg_scope([mobilenet_base, mobilenet], is_training=is_training),
safe_arg_scope([slim.batch_norm], **batch_norm_params),
safe_arg_scope([slim.dropout], is_training=is_training,
keep_prob=dropout_keep_prob),
slim.arg_scope([slim.conv2d],
weights_regularizer=slim.l2_regularizer(weight_decay)),
slim.arg_scope([slim.separable_conv2d], weights_regularizer=None) as s:
return s

I’m not very familiar with the slim module (and it is hard to find documentation about it) but as far as I understand that all layers under the decorator @slim.add_arg_scope will use the provided arguments.
If i’m right it would mean that all conv2d layers apply the L2 regularization.

Looking at the source of the TF2 Keras implementation this regularization is not applied in any conv2d layers.
(tensorflow/tensorflow/python/keras/applications/mobilenet_v2.py at v2.5.0 · tensorflow/tensorflow · GitHub)

I would like to know if these two implementation are really equivalent and If there is someone who knows the reason behind the absence of the L2 regularization in the keras implementation.

Thanks in advance!

Hi @lololouuuu ,

As per my understanding, you are correct that the TF1 MobileNetV2 implementation uses L2 regularization on all conv2d layers, while the TF2 Keras implementation does not.

The reason for this is that L2 regularization is not always necessary, and it can sometimes have a negative impact on the performance of a model.

It is possible that the Keras team decided that L2 regularization was not necessary for MobileNetV2, or that they wanted to make the implementation simpler.

It is important to note that the absence of L2 regularization in the Keras implementation does not mean that the two implementations are not equivalent. but they can still achieve similar results.So it is a good idea to experiment with L2 regularization to see if it helps improve the performance of your model.

If you find any other helpful information, please share it here so that others can benefit from it.

I hope this helps.