Tf.random.uniform() brings much latency in preprcessing functions passed to TFRecordDataset.map()

Hello,

I’m using tf.random.uniform() function for data augmentation in the following style:

def augment_fn(x):
    if tf.random.uniform(()) > 0.5:
        # DO SOMETHING
    if tf.random.uniform(()) > 0.5:
        # DO SOMETHING
    return x

I then pass it to my dataset by:

train_dataset = tf.data.TFRecordDataset(tffiles).map(augment_fn)
train_dataset = train_dataset.batch(batch_size).prefetch(buffer_size=tf.data.AUTOTUNE)

However, if I replace the random function with a True statement, the latency of each step would decrease from 150 ms to 60 ms …

So I’m wondering if the tf.random.uniform() function is genuinely this slow, or I’m using it incorrectly?

And if it is truly slow, do you have some more effective way to implement random data augementation?

Thank you!

sorry please forgive my stupidity… problem is solved by passing num_parallel_calls=tf.data.AUTOTUNE:sob: :sob: :sob:

Hi @Jue_Wu ,

The tf.random.uniform() function is not inherently slow, but calling it multiple times within a loop, as shown in your example, can lead to performance degradation due to the repeated overhead of generating random numbers. Instead, you can generate random numbers once and use them multiple times within the loop for data augmentation.

The reason for the observed latency difference could be that when using tf.random.uniform(()) > 0.5 , the random number is generated for each condition check independently. This means that you are generating two random numbers for every data point processed, whereas when using a True statement, the random number is generated only once.

You can find the documentation for TensorFlow’s image augmentation functions in the official TensorFlow API documentation. Here’s the link to the tf.image module documentation.

I hope this helps!

Thanks

1 Like