Hi community,
I’m starting to learn tensorflow and I’m following the code examples from here
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu')
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
return self.d2(x)
model = MyModel()
with tf.GradientTape() as tape:
logits = model(images)
loss_value = loss(logits, labels)
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
what I don’t understand is, if I have several inputs, how is a single loss_value
giving me all the gradients?
My current understanding of backpropagation is that a single input gives a single output, which gives a single loss value, which is backpropagated to give you a single weight gradient.
This blog from Nielsen has helped me understand the backpropagation using python and numpy. http://neuralnetworksanddeeplearning.com/chap2.html
Could someone please guide me to resources that explain how tensorflow uses a single loss_value
to actually give you the gradients for all points in a dataset?