BatchNormalization in training mode without updating moving mean and variance?

walline · September 15, 2022, 12:41pm

Hello TensorFlow community,

I’m trying to find a way in TF2 to use the tf.keras.layers.BatchNormalization layer in training mode (i.e. normalizing using the statistics of the current batch) but without updating the moving mean and variance (for some batches, not all).

In TF1, using tf.layers.batch_normalization, you could do something like

x = my_first_inputs # I want to use these data for updating moving statistics
y = my_second_inputs # I do not want to use these data for updating moving statistics

out_x = my_model(x, training=True)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
out_y = my_model(y, training=True)
…
train_op = gradient step to minimize loss
…
with tf.control_dependencies([train_op]):
train_op = tf.group(*update_ops)

session.run(train_op)

Does anyone have an idea of how to replicate this in TF2?

Best,
Erik

Bhack · September 15, 2022, 2:51pm

Also if not strictly related to the batchnorm I see that TF agents is still manipulating the same API in TF2.0 using v1 namespace:

github.com

tensorflow/agents/blob/master/tf_agents/utils/eager_utils.py#L406-L423


      
            transform_grads_fn: A function which takes a single argument, a list of
              gradient to variable pairs (tuples), performs any requested gradient
              updates, such as gradient clipping or multipliers, and returns the updated
              list.
            summarize_gradients: Whether or not add summaries for each gradient.
            gate_gradients: How to gate the computation of gradients. See tf.Optimizer.
            aggregation_method: Specifies the method used to combine gradient terms.
              Valid values are defined in the class `AggregationMethod`.
            check_numerics: Whether or not we apply check_numerics.
          
          Returns:
            A `Tensor` that when evaluated, computes the gradients and returns the total
              loss value.
          """
          if global_step is _USE_GLOBAL_STEP:
            global_step = tf.compat.v1.train.get_or_create_global_step()
          
          # Update ops use GraphKeys.UPDATE_OPS collection if update_ops is None.

This is the migration guide:

walline · September 15, 2022, 10:41pm

Thank you for your answer.

Unfortunately I can’t really find a solution for my problem in the migration guide. All it says is that the moving statistics for BatchNorm will be updated automatically in TF2 when calling with “training=True”, which is what I don’t want.

I am also unfortunately not familiar enough with all the inner mechanics of TF2 to understand how the snippet from eager_utils.py helps me.

walline · June 19, 2023, 12:30pm

I think I finally found a fairly good solution to this. Posting in case anyone with the same problem finds this thread.

One cause of my original problem is that tf.keras.layers.BatchNormalization uses a custom behavior for layer.trainable = False. From the docs:

However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).

This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.

This behavior can be disabled by subclassing tf.keras.layers.BatchNormalization and overwriting the _get_training_value() method:

class MyBatchNorm(tf.keras.layers.BatchNormalization):

    def _get_training_value(self, training=None):
        if training is None:
            training = backend.learning_phase()
        if self._USE_V2_BEHAVIOR:
            if isinstance(training, int):
                training = bool(training)
            #if not self.trainable:
            #    # When the layer is not trainable, it overrides the value passed
            #    # from model.
            #    training = False
        return training

Note that the custom behavior for layer.training is disabled by the commenting the four lines.

We can then use batch normalization with batch statistics but without updating the moving statistics using something like

model = MyBatchNorm()

model.trainable = False
out = model(x, training=True)

whereas the unmodified tf.keras.layers.BatchNormalization would use the moving statistics in this call.

Topic		Replies	Views
How to read Batchnorm layer's parameters in TF2.x General Discussion keras , tf-v1 , help_request	1	808	January 2, 2024
Tensorflow (2.9.1) : Changing the 'trainable' attribute on a layer during training General Discussion help_request , tf_function	3	2300	June 14, 2022
TF 2.x Reading parameters from keras BN Op General Discussion api , keras , help_request	1	484	July 27, 2021
Unable to train due to the following General Discussion models , keras	3	1547	March 29, 2023
How to redirect gradients in a keras model? Keras models , rcnn , tfkeras	1	288	February 8, 2024

BatchNormalization in training mode without updating moving mean and variance?

Related topics