Distributed Training
Of the strategies used in distributed training, among MultiWorkerMirroredStrategy and the ParameterMirroredStrategy when using the former one, we the workers have access to direct dataset and undergo synchronous training of variables and the ParameterServerStrategy has a parameter server
which holds the variables right?.
So does that mean the when implementing ParameterServerStrategy, the dataset can be stored at where the parameter server is instantiated say a server with high storage space and dispatch training calls to workers on another server (say nvidia dgx) without workers having direct access to the dataset?
The tutorials, inside tensorflow series and the coursera coures (Laurence Moroney Custom and Distributed Training) does’nt seem to have end to end working demo of ParameterServerStrategy code to test across multiple devices.
Please correct me if I am wrong anywhere…
including a disucssion of a stackoverflow thread: tensorflow2.0 - When is TensorFlow's ParameterServerStrategy preferable to its MultiWorkerMirroredStrategy? - Stack Overflow