Tf.random.uniform() brings much latency in preprcessing functions passed to TFRecordDataset.map()

Jue_Wu · June 23, 2023, 7:43pm

Hello,

I’m using tf.random.uniform() function for data augmentation in the following style:

def augment_fn(x):
    if tf.random.uniform(()) > 0.5:
        # DO SOMETHING
    if tf.random.uniform(()) > 0.5:
        # DO SOMETHING
    return x

I then pass it to my dataset by:

train_dataset = tf.data.TFRecordDataset(tffiles).map(augment_fn)
train_dataset = train_dataset.batch(batch_size).prefetch(buffer_size=tf.data.AUTOTUNE)

However, if I replace the random function with a True statement, the latency of each step would decrease from 150 ms to 60 ms …

So I’m wondering if the tf.random.uniform() function is genuinely this slow, or I’m using it incorrectly?

And if it is truly slow, do you have some more effective way to implement random data augementation?

Thank you!

Jue_Wu · June 23, 2023, 8:17pm

sorry please forgive my stupidity… problem is solved by passing num_parallel_calls=tf.data.AUTOTUNE …

Laxma_Reddy_Patlolla · June 23, 2023, 8:40pm

Hi @Jue_Wu ,

The tf.random.uniform() function is not inherently slow, but calling it multiple times within a loop, as shown in your example, can lead to performance degradation due to the repeated overhead of generating random numbers. Instead, you can generate random numbers once and use them multiple times within the loop for data augmentation.

The reason for the observed latency difference could be that when using tf.random.uniform(()) > 0.5 , the random number is generated for each condition check independently. This means that you are generating two random numbers for every data point processed, whereas when using a True statement, the random number is generated only once.

You can find the documentation for TensorFlow’s image augmentation functions in the official TensorFlow API documentation. Here’s the link to the tf.image module documentation.

I hope this helps!

Thanks

Topic		Replies	Views
The speed of training is reduced using a custom method in tensorflow.keras.layers General Discussion keras , tfdata , help_request	9	2599	January 19, 2022
Parallel data extraction with tf.data.Dataset.from_generator General Discussion datasets	2	784	June 1, 2023
TensorFlow Dataset reduce function too slow after skip General Discussion tfdata , tf_function	1	332	October 29, 2024
How to apply 3D data augmentation to my dataset? General Discussion pipelines , datasets , augmentation , tf_function	1	318	January 10, 2024
How efficiently filter a specific number of entries and concatenating them in a unique tf.data.Dataset General Discussion tfdata	1	365	October 11, 2024

Tf.random.uniform() brings much latency in preprcessing functions passed to TFRecordDataset.map()

Related topics