Hi, I am working on an RL model in TF. I am working on a pointer network (that outputs a sequence of indices). When training the model, I want to build a custom reward function where tf output sequences can be passed through a different function individually. For example, if the output is [1,2,3,4], I want 1,2,3, and 4 individually to a function, sat F, can gives out reward values for 1, 2, 3, 4 individually. However, I get the error:
Cannot convert a symbolic Tensor (strided_slice_1:0) to a numpy array. This error may indicate that you’re trying to pass a Tensor to a NumPy call, which is not supported
I am not able to convert output into numpy type array which I can pass through to the custom function. I have seen it can be directly done in pytorch but I tried everything I could find on stack overflow and other places but could not figure out how to do that in tensorflow. Let me know if someone can help with this. Some code:
here I am getting sequence of indices for a batch
for step in range(1,self.max_length): # sample from POINTER
query = tf.nn.relu(tf.matmul(query1, W_1) + tf.matmul(query2, W_2) + tf.matmul(query3, W_3))
logits = pointer(encoded_ref=encoded_ref, query=query, mask=self.mask_, W_ref=W_ref, W_q=W_q, v=v, C=self.C, temperature=self.temperature)
prob = distr.Categorical(logits) # logits = masked_scores
idx = prob.sample()
idx_list.append(idx) # tour index
log_probs.append(prob.log_prob(idx)) # log prob
entropies.append(prob.entropy()) # entropies
self.mask_ = self.mask_ + tf.one_hot(idx, self.max_length) # mask
idx_ = tf.stack([tf.range(self.batch_size,dtype=tf.int32), idx],1) # idx with batch
query3 = query2
query2 = query1
query1 = tf.gather_nd(actor_encoding, idx_) # update trajectory (state)
idx_list.append(idx_list[0]) # return to start
self.tour = tf.stack(idx_list, axis=1) # permutations
i want to pass this tour (that has size batch size x input dimension x dimension) and return reward values of size [batch]
thank you! Any pointer or help is highly appreciated