I am new working with contextual bandits with TensorFlow, I wanted to ask if TensorFlow supports changing continious actions. In case of my application actions can be between 2 to 100 for different time-steps. Currently we are using vowel-wabbit, which does not provide a lot of flexibility and we want to port it to TensorFlow.
Hi @shabib ,
You can implement contextual bandits with continuous actions using this approach
Deep Deterministic Policy Gradient (DDPG) Specifically designed for continuous action spaces.
You can refer this official documentation for the same for better understanding .
Thank you .