How to implement inverting Gradients [PDQN,MPDQN] in Tensorflow 2.7

Unnamed · January 22, 2022, 2:30am

I am trying to reimplement inverting gradients with gradienttape in tensorflow 2.7. In this example i use the pendulum domain which has an observation size of 3, action size of 1 and no discrete actions.
as shown in this paper: https://arxiv.org/pdf/1511.04143.pdf

Someone solved it for tensorflow 1.0 here: python - How to implement inverting gradient in Tensorflow? - Stack Overflow

But i am strugglingin reimplementing it for tensorflow 2.0

As far as i understand we need the derivative of dQ(s,a(w,s)): (dq/da)*(da/dw) with w beeing the weights of the policy network.
This is needed to update the weights of the policy network.
So we can access the derivative dq/da via:

dq_das = tf.Variable(tape.gradient(loss, actions))

Now we can caluculate the inverting gradients. Shape fits with actions and there is no problem here in calculating:

    upper=1
    lower=-1

    
    for i in range(dq_das.shape[0]):
        dq_da = dq_das[i]
        action = actions[i]
        if dq_da >= 0.0:
            dq_das[i].assign(dq_da * (upper - action) / (upper - lower))
        else:
            dq_das[i].assign((dq_da * (action - lower) / (upper - lower)))

The derivative da_dw we can access via:

da_dw = tape.gradient(actions, self.policy_Net.trainable_variables)

The problem now is that the shapes don’t fit. If i want to calculate dq_da*da_dw.

For dq_da i get:
<tf.Variable ‘Variable:0’ shape=(124, 1) dtype=float32>

which makes sense since the batchsize is 124 and there is one action. And for da_dw i get:

[<tf.Tensor 'gradient_tape/policy__network/dense/MatMul_1:0' shape=(3, 400) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense/BiasAdd/BiasAddGrad_1:0' shape=(400,) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_1/MatMul_3:0' shape=(403, 300) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_1/BiasAdd/BiasAddGrad_1:0' shape=(300,) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_2/MatMul_3:0' shape=(703, 1) dtype=float32>, <tf.Tensor 'gradient_tape/policy__network/dense_2/BiasAdd/BiasAddGrad_1:0' shape=(1,) dtype=float32>]

Where is my mistake? Thanks a lot!
My Code looks like this so far:

@tf.function
def __inv_Grads__(self,states):
    
    #states = tf.Variable(states)
    
    with tf.GradientTape(persistent=True) as tape:
        actions = self.policy_Net(states)
        q,_,_ = self.value_Net(states,actions)
        loss = -tf.reduce_sum(q,axis=1,keepdims=True)
        loss = tf.math.reduce_mean(loss)
        
    dq_das = tf.Variable(tape.gradient(loss, actions))
    da_dw = tape.gradient(actions, self.policy_Net.trainable_variables)
    
    
    inverting_gradients = []
    
    upper=1
    lower=-1

    
    for i in range(dq_das.shape[0]):
        dq_da = dq_das[i]
        action = actions[i]
        if dq_da >= 0.0:
            dq_das[i].assign(dq_da * (upper - action) / (upper - lower))
        else:
            dq_das[i].assign((dq_da * (action - lower) / (upper - lower)))
    
    
    
    print(dq_das)
    print(da_dw)
    print(dq_das*da_dw)
    exit()
    
    return 0

Topic		Replies	Views
Converting snake-dqn code to python has ValueError General Discussion tfjs , help_request	0	1287	November 23, 2021
Code error using Gradient Tape General Discussion help_request , tensorflow	2	1779	July 13, 2022
Autograd unable to find gradient General Discussion tfgradient , help_request	1	295	June 21, 2024
Custom Gradient for Sparse Weight Tensors General Discussion help_request , tf-core	3	1579	April 17, 2022
Unable to calculate GradientTape.gradient() with tensorflow variable General Discussion	1	310	November 29, 2024

How to implement inverting Gradients [PDQN,MPDQN] in Tensorflow 2.7

Related topics