Backpropagation in tensorflow

hargun3045 · June 20, 2023, 9:39am

Hi community,

I’m starting to learn tensorflow and I’m following the code examples from here

class MyModel(tf.keras.Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.conv1 = Conv2D(32, 3, activation='relu')
    self.flatten = Flatten()
    self.d1 = Dense(128, activation='relu')
    self.d2 = Dense(10, activation='softmax')

  def call(self, x):
    x = self.conv1(x)
    x = self.flatten(x)
    x = self.d1(x)
    return self.d2(x)
model = MyModel()

with tf.GradientTape() as tape:
  logits = model(images)
  loss_value = loss(logits, labels)
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

what I don’t understand is, if I have several inputs, how is a single loss_value giving me all the gradients?

My current understanding of backpropagation is that a single input gives a single output, which gives a single loss value, which is backpropagated to give you a single weight gradient.

This blog from Nielsen has helped me understand the backpropagation using python and numpy. http://neuralnetworksanddeeplearning.com/chap2.html

Could someone please guide me to resources that explain how tensorflow uses a single loss_value to actually give you the gradients for all points in a dataset?

chunduriv · June 20, 2023, 10:02am

@hargun3045,

Welcome to the Tensorflow Forum!

When using Tensorflow with multiple inputs, the usual practice is to calculate the loss value by averaging or summing the losses for all the inputs in your dataset referred as batch or mini-batch method. It involves processing a small portion of your dataset at a time instead of handling a single input individually or the entire dataset.

Thank you!

hargun3045 · June 20, 2023, 11:04am

Thanks for the response.

But then how can a single loss value help calculate all the gradients?

According to what I’ve understood, with backpropagation, we can have one input, compute the loss, and use the intermediate activations to calculate the gradients. But it’s unclear how it’s happening in tensorflow. Could you guide me to some appropriate resources?

chunduriv · June 22, 2023, 8:53am

@hargun3045,

The loss is calculated across the batches and then the average loss is used to calculate the loss.

Please refer to the explanation from CS229 Lecture Notes

Thank you!

Topic		Replies	Views
Gradient calculation from scalar Loss? General Discussion custom-loss	2	430	February 8, 2024
Please explain why the following backpropagation calculations are done General Discussion gradienttape , tensorflow	1	385	October 26, 2023
Passing in multiple losses in tape.gradient General Discussion models , keras , education , help_request	1	1207	February 6, 2024
Where does the tape break after all? General Discussion gradienttape	0	253	October 20, 2023
ResNet Implementation General Discussion models , help_request	4	601	September 11, 2021

Backpropagation in tensorflow

Related topics