According to my expectations map_fn
should execute some function in parallel. Unfortunately, it seems that it executes it sequentially. Here is a brief example I created to verify my concerns.
@tf.function
def test_mapfn():
@tf.function
def body(vals):
res = tf.constant(1, dtype=tf.float32)
for i in tf.range(5, dtype=tf.float32):
tf.print(i)
# Doing some random calculation below. Not important.
res = res + tf.pow(tf.reduce_sum(tf.where(vals> 0.5, vals, tf.constant(0, dtype=tf.float32))) + tf.pow(tf.reduce_sum(vals), 0.5) - tf.pow(tf.reduce_sum(vals), tf.constant(2, dtype=tf.float32)), -i)
return res
tensor = tf.random.uniform(shape=(2, 100000000))
res = tf.map_fn(body, tensor, parallel_iterations=2)
The printed values are
0
1
2
3
4
0
1
2
3
4
If executed in parallel the code above should mix the printed values in the loop but they are printed sequentially. I know that this is pretty basic test but I noticed the same behavior with functions which take longer time to execute. The CPU is not utilized completely and seems that the function is called sequentially with unrolled inputs. I know the recommendation to use vectorized_map
but unfortunately in my use case it is not applicable since the code can’t be rewritten to comply with the requirements of vectorized_map
. Also, changing the number of parallel_iterations
doesn’t seem to affect the speed.