Hello,
I had a question regarding the behavior of tf.gradients() as opposed tf.gradientTape.gradient() in graph mode.
Given a differentiable function y = f(x), where x and y are each single tensorflow tensors, is there any difference between the behavior of tf.gradient(y, x) vs tape.gradient(y, x) where tape is an instance of tf.gradientTape (assuming the use of graph mode) ?
Not sure why tensorflow has two different gradient methods which can be used with graph mode - maybe there are some subtle differences in the implementations? I’ve looked at the documentation for gradientTape and tf.gradients but it’s not clear whether there is any difference between the behavior of these methods for a single (x, y) pair, or whether it’s just that tf.gradients() can be used in this case for a speedup when using graph mode.
Thank you so much for your help!
Hi @Mihir_Khambete
Welcome to the TensorFlow Forum!
tf.gradients()
and tf.GradientTape.gradient()
both are used for computing the gradients.
tf.gradients is only valid in a graph context and was mainly used in TF v1. In particular, it is valid in the context of a tf.function
wrapper, where code is executing as a graph.
x = tf.Variable(3.0)
@tf.function
def example():
y = x**2
return tf.gradients(y, x)
example()
# Output:[<tf.Tensor: shape=(), dtype=float32, numpy=6.0>]
tf.GradientTape.gradient()
is more flexible for automatic differentiaton as it “records” relevant operations executed inside the context of a tf.GradientTape
onto a “tape” where as tf.gradients
was used to compute the gradient without using tape.
x = tf.Variable(3.0)
with tf.GradientTape() as tape:
y = x**2
# dy = 2x * dx
dy_dx = tape.gradient(y, x)
dy_dx
# Output: <tf.Tensor: shape=(), dtype=float32, numpy=6.0>
Please refer to the mentioned links for the detailed understanding in this. Thank you.