Hello, I am pretty new to TF-Agents and feel confused about the used metrics in the replay_buffer, dynamic driver and the agent training.
I would really appreatiate if someone can give me a brief explanation about the following terms:
I have created 4 environments and put them all together in a BatchedPyEnvironment
and then converted it to a TFPyEnvironment
. so the batch_size of this environment is 4.
Then I created a TFUniformReplayBuffer
, so what are the 1.batch_zise and 2.max_length ? I understand the batch size is how many elements is stored in the batch, but when I change the max_length value , nothing really happens, unless it is 1 it gives an error.
my observation is a (1,25) vector of integers.
Then to read the replay buffer, I create a dataset outside the training loop ,through replay_buffer.as_dataset
, should this dataset be created at each training iteration? , also what is the difference between the batch_size in the TFUniformReplayBuffer
and the 3.sample_batch_size ?
when I change the num_steps from 2 to 1, it also give an error, so what does this 4.num_steps mean?
then I create an iterator = iter(dataset)
also outside the training loop.
Also when I see the 5.number of episodes, and 6.number of steps by
env_steps.result().numpy()
and num_episodes.result().numpy()
after the training loop, it shows different numbers that I can’t control every time I run the training.
I create a step driver to collect experience inside the loop dynamic_step_driver.DynamicStepDriver
, but I’m also not sure I got what is the 7.num_steps in it really representing, the training loop I can control the 8.num_iterations , but I though this will be the same number of episodes I get from num_episodes.result().numpy()
and the number of steps I get from env_steps.result().numpy()
, would be the same as num_steps
in dynamic_step_driver.DynamicStepDriver
, but they are all different.
Any hints will be very helpful, Thanks!