Here I’m measuring step/sec, so higher number is faster.
I also tried to to wrap everything in a function annotated with ‘@tf.function’ but that didn’t help either.
Hi lorenzos. Sorry I completely misread. Looking again at it, isn’t the issue due to loop put over the wrong loop? My understanding is that you run the random policy x4000 starting the whole learning course from scratch (the whole program) instead of running 4000 steps of that random policy (and it makes sense looping like this withTensorflow environment is slower).
Based, on this Tensorflow documentation, running 4000 steps of the random policy is faster with the Tensorflow environment (I replaced dynamic_step_driver.DynamicStepDriver with tf_driver.TFDriver to make Python and Tensroflow environment more “similar”) :
Python:
import time
from tf_agents.environments import suite_gym
from tf_agents.drivers import py_driver
from tf_agents.environments import tf_py_environment
from tf_agents.policies import random_py_policy
import time
from tf_agents.environments import suite_gym
from tf_agents.drivers import tf_driver
from tf_agents.environments import tf_environment
from tf_agents.policies import random_tf_policy