I have realized that I can take the gradient of a vector w.r.t an input. In other words I could perform:
import numpy as np
import tensorflow as tf
w = tf.Variable( [[1. , 2. ],[ 3., 4.]] , name='w')
x = tf.Variable( [[1., 2.]] , name='x')
with tf.GradientTape(persistent=True) as tape:
y = x @ w
loss = y
grad = tape.gradient( loss, [x])
Here y
is a vector so I would expect gradient to be a matrix. Since I would be asking to compute the gradient of each coordinate in y
wrt each coordinate in the vector ‘x’. In other words I am expecting the Jacobian. What is happening here under the hood? Since I am getting a vector.