Tensorflow: map_fn and parallel execution

According to my expectations map_fn should execute some function in parallel. Unfortunately, it seems that it executes it sequentially. Here is a brief example I created to verify my concerns.

@tf.function
def test_mapfn():
    @tf.function
    def body(vals):
        res = tf.constant(1, dtype=tf.float32)
        for i in tf.range(5, dtype=tf.float32):
            tf.print(i)
            # Doing some random calculation below. Not important.
            res = res + tf.pow(tf.reduce_sum(tf.where(vals> 0.5, vals, tf.constant(0, dtype=tf.float32))) + tf.pow(tf.reduce_sum(vals), 0.5) - tf.pow(tf.reduce_sum(vals), tf.constant(2, dtype=tf.float32)), -i)
        return res

    tensor = tf.random.uniform(shape=(2, 100000000))
    res = tf.map_fn(body, tensor, parallel_iterations=2)

The printed values are

0
1
2
3
4
0
1
2
3
4

If executed in parallel the code above should mix the printed values in the loop but they are printed sequentially. I know that this is pretty basic test but I noticed the same behavior with functions which take longer time to execute. The CPU is not utilized completely and seems that the function is called sequentially with unrolled inputs. I know the recommendation to use vectorized_map but unfortunately in my use case it is not applicable since the code can’t be rewritten to comply with the requirements of vectorized_map. Also, changing the number of parallel_iterations doesn’t seem to affect the speed.

1 Like

Hi @Yani_Boshev, I have tried the above code in another way and the execution happens in parallel. please refer to this gist and let us know if it is the same that you are expecting.

Thank You.