model: hidden layers - 8, hidden size - 20, input size - 2, output size - 1, tf 2.4.0
I am confused, why different ways of calculating the second derivative give different results?
@tf.function
def get_vanilla_hess(model, xs):
with tf.GradientTape(persistent=True) as tape:
tape.watch(xs)
ys = model(xs)
xbar = tape.gradient(ys, xs)
xbarbar = tape.batch_jacobian(xbar, xs)
return (ys, xbar, xbarbar)
print(get_vanilla_hess(vanilla_model, X_r)[-1][:, 0, 0])
returns
[-0.0004067 , -0.00038697, -0.00037729, ..., -0.00035329,
-0.00038197, -0.00038998]
while
@tf.function
def get_vanilla_hess_alt(model, xs):
with tf.GradientTape(persistent=True) as tape:
tape.watch(xs)
ys = model(xs)
xbar = tape.gradient(ys, xs)
xbarbar = tape.gradient(xbar, xs)
return (ys, xbar, xbarbar)
print(get_vanilla_hess_alt(vanilla_model, X_r)[-1][:, 0])
returns
[-0.00036503, -0.00033761, -0.00032976, ..., -0.00029553,
-0.00032992, -0.00034215]
Also:
Manually created graph for calculating the hessian returns
[-0.00040658, -0.00038687, -0.00037727, ..., -0.0003532 ,
-0.00038189, -0.00039003]
Does tape.gradient() + tape.gradient() returns same as tape.gradient() + tape.batch_jacobian() on the diag? (d^2f/dx^2)