Hi everybody,
despite searching on google and the forum, I wasn’t able to find a good explanation on how to initialize an agent with a custom policy. How do I do that? Does anybody have a hint?
Greetings,
Stefan
Hi everybody,
despite searching on google and the forum, I wasn’t able to find a good explanation on how to initialize an agent with a custom policy. How do I do that? Does anybody have a hint?
Greetings,
Stefan
Have you already seen this tutorial?
Yes - I tried to follow “Example 3: Q Policy”, which works so far. But where in this tutorial is the agent located?
[Thank you very much for your time and sorry for such maybe dumb questions]
Policies can be created independently of agents. E.g. see the random_policy
in DQN tutorial
If you are going to create your custom Agent policy
and collect_policy
are constructor args:
No problem. Deep RL is not easy.
Your agent [an abstract term, so it’s just your program] would try to learn to do X by interacting with an environment (e.g. the game of Pong
via Gym or an Android app via AndroidEnv
using a policy (approximated by a neural net - hence “deep” RL) to gain experience. The policy (your neural net) “belongs” to your agent - it maps the agent’s observations (inputs, could be image pixels or sequential data directly via an API) to its actions/action log probabilities (outputs).
In addition, you may find this post - Deep Reinforcement Learning With TensorFlow 2.1 | Roman Ring - helpful. It was written by an engineer who now works at DeepMind. In addition, there’s a YouTube channel that teaches well-known "basic deep RL methods - such as DQN, policy gradients, and actor-critic methods - with core TensorFlow 2. Check out https://www.youtube.com/watch?v=LawaN3BdI00 (actor-critic methods) or https://www.youtube.com/watch?v=SMZfgeHFFcA (DQN).
Thank you, the second one seems to be what I am looking for. I tried to code with the dqn_agent and wondered why there was no policy arg.
Thank you for the explanation and the links. I already wrote a custom environment and now I want to write a custom policy as well. Let’s say I have a use case like the following:
There are a number of boxes and a number of pieces. The task of the agent is to choose a piece and sort it into a box so that the load of the fullest box is minimized. I will have a look at the links.
You could also write an “MVP” using, for example, an SAC (by BAIR) algorithm with just TF Probability and TF as a start.
Here’s a “clean” example:
(Another example with code: Deep Reinforcement Learning: Playing CartPole through Asynchronous Advantage Actor Critic (A3C) with tf.keras and eager execution — The TensorFlow Blog)
Is this just online 3d bin packing?
In fact, I attempt to solve a Flexible Job Shop Scheduling Problem. I try to model it in a way that the machines are the “boxes” and the tasks of the jobs are the pieces which are to be sorted in the boxes. I want to test if that works.
Thank you very much, I will take a look at this code.
You can take a look at