Effective learning rate when using tf.distribute.MirroredStrategy (one host, multi-GPU)

marcocintra · March 18, 2024, 3:21pm

Hi!

When using tf.distribute.MirroredStrategy (one host, multi-GPU) the effective learning rate is the desired learning rate scaled by the number of GPUs (multiplying the learning rate by the number of GPUs) or is just the learning rate desired when using just one GPU?

For example, if I want an learning rate = 1E-3 when using 1 GPU, I just use learning rate = 1E-3 (without using tf.distribute.MirroredStrategy); if I use tf.distribute.MirroredStrategy with 8 GPUs should I set learning rate = 8E-3 (8 * 1E-3), the same way I should multiply the batch size by 8 when I’m scaling to 8 GPUs, or should I just use 1E-3 as the learning rate?

Thanks in advance!

Tim_Wolfe · March 18, 2024, 6:08pm

No, when using tf.distribute.MirroredStrategy with multiple GPUs, you don’t automatically scale the learning rate by the number of GPUs. You start with the same learning rate as for a single GPU and adjust based on your observations. Scaling the learning rate is a heuristic that may help but requires experimentation.

marcocintra · March 18, 2024, 7:00pm

Ok, I think this is the reason why in this TensorFlow guide (https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit, see end of the section) it is described that when scaling to N GPUs it will be necessary to tune the learning rate, but DEPENDING ON THE MODEL, I think this follow what you’ve said (which may or may not be necessary), so it is not a rule, correct? Thanks!

Topic		Replies	Views
Effective batch size using tf.distribute.MirroredStrategy Keras distributed-training , keras	3	641	September 19, 2023
Multi GPU and TensorFlow MirroredStrategy General Discussion distributed-training , help_request	1	663	October 4, 2024
MultiWorkerMirroredStrategy General Discussion distributed-training , gpu , help_request	1	1518	January 2, 2024
How to change custom loss to use tf.distribute.Strategy? General Discussion distributed-training , custom-loss	4	452	January 8, 2024
Parallelising model with multiple inputs Keras distributed-training , keras , custom-loss , gpu , model-training	3	458	May 21, 2024

Effective learning rate when using tf.distribute.MirroredStrategy (one host, multi-GPU)

Related topics