TF Probability distributed training?

piEsposito · October 19, 2021, 1:37am

Is there a proper way to perform distributed training on TF Probability with mirrored strategy?

When we sample the weights from the Reparameterization layers they will be a lot different between between each device and bc of that this will mess up the loss and gradients as my model replicas out of sync between the devices.

Is there a proper way to set the random seed across all GPUs and sync the models across all nodes?

Tks.

Jetti_Bharat · September 13, 2024, 4:32pm

Hello @piEsposito

Thank you for using TensorFlow,

The api tf.distribute.MirroredStrategy documentation, Each variable in the model is mirrored across all the replicas. Together, these variables form a single conceptual variable called MirroredVariable . These variables are kept in sync with each other by applying identical updates.

tf.random.set_seed(seed)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    tf.random.set_seed(seed)

tf.random.set_seed function is used to set the seed. In a distributed setup, we have to manually set seed to make sure all devices are initialized with same seed.

Thank you.

Topic		Replies	Views
Distribute on GPU data creation of random variable General Discussion distributed-training , gpu	0	351	October 17, 2022
How are gradients applied in distributed custom loops? General Discussion distributed-training , help_request	1	903	October 16, 2024
How to use sample weight under MirroredStrategy mode General Discussion distributed-training	3	286	December 28, 2023
Load model within MirroredStrategy General Discussion models , distributed-training , keras , help_request	1	1376	December 20, 2022
MultiWorkerMirroredStrategy General Discussion distributed-training , gpu , help_request	1	1512	January 2, 2024

TF Probability distributed training?

Related topics