Best way to compute Hessian-vector product?

Cheerful_Squirrel · November 19, 2021, 2:23pm

I am trying to compute the Hessian-vector product. The documentation on Advanced automatic differentiation (Advanced automatic differentiation | TensorFlow Core) links to:

github.com

tensorflow/tensorflow/blob/master/tensorflow/python/eager/benchmarks/resnet50/hvp_test.py

# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests and benchmarks for Hessian-vector products with ResNet50."""

import gc
import time

from absl.testing import parameterized

This file has been truncated. show original

However, this has half a dozen different ways of implementing it! Which is most efficient? I tried to run the benchmark to get those numbers, but I couldn’t figure out how.

Renu_Patel · November 29, 2023, 12:03pm

Hi @Cheerful_Squirrel

You can use tf.GradientTape.jacobian for the Hession Vector product. Please refer to this Hessian matrix example using tf.GradientTape.jacobian.


x = tf.random.normal([7, 5])
layer1 = tf.keras.layers.Dense(8, activation=tf.nn.relu)
layer2 = tf.keras.layers.Dense(6, activation=tf.nn.relu)

with tf.GradientTape() as t2:
  with tf.GradientTape() as t1:
    x = layer1(x)
    x = layer2(x)
    loss = tf.reduce_mean(x**2)
  g = t1.gradient(loss, layer1.kernel)
h = t2.jacobian(g, layer1.kernel)

print(f'layer.kernel.shape: {layer1.kernel.shape}')
print(f'h.shape: {h.shape}')

Output:

layer.kernel.shape: (5, 8)
h.shape: (5, 8, 5, 8)

Topic		Replies	Views
Tape.batch_jacobian() and tape.gradient() give different results General Discussion education , help_request , tfcore	2	1273	February 28, 2022
Error in Hessian calculation using forward over backward propagation General Discussion help_request	1	499	January 10, 2022
Calculate Hessian of loss with respect to model layer for a batch of samples General Discussion models	2	408	November 21, 2022
Compute the directional derivative of a function with a tensor General Discussion api , keras , gradienttape	3	524	January 18, 2023
Help with refactoring nested loops General Discussion help_request , tfcore	2	1567	March 14, 2022

Best way to compute Hessian-vector product?

Related topics