Hi,
In my code I am calling the Adam optimizer as follows:
self.dqn_architecture.optimizer.apply_gradients(zip(dqn_architecture_grads, traibnable_vars))
But I noticed the following showing up in my logs:
2023-02-17 20:05:44,776 5 out of the last 5 calls to <function _BaseOptimizer._update_step_xla at 0x7f55421ab6d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2023-02-17 20:05:44,822 6 out of the last 6 calls to <function _BaseOptimizer._update_step_xla at 0x7f55421ab6d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
On further investigation I found that I am passing python lists of tensors to the optimizer as opposed to tensors of tensors i.e. (3)
I’ve also noticed that there seems to be a memory leak as my RAM usage continues to grow the more I train the model. This makes sense because on stackoverflow I read that:
'Passing python scalars or lists as arguments to tf.function will always build a new graph. To avoid this, pass numeric arguments as Tensors whenever possible'
So, I believe the solution would be to pass a tensor of these tensors as opposed to a list. But, on trying to convert the lists to tensors using tf.convert_to_tensor(), I get the error:
'Shapes of all inputs must match: values[0].shape = [8,8,4,32] != values[1].shape = [32] [Op:Pack] name: packed'
because the tensors have varying dimensionality.
I’ve also tried using tf.ragged.constant, but get:
raise ValueError("all scalar values must have the same nesting depth")
Any help would be appreciated. Really need to get this sorted.