Sharding in Parameter Server Strategy

mariabeatrizmo · March 17, 2023, 8:42am

Hello,
I need to understand how parameter server strategy is distributing the dataset (of tfrecords) through the workers, so I made a script to run it with a profiler. However, I realized that every worker was reading the full dataset… I think they are processing different data since the execution time does decrease when I add more workers, but in that case shouldn’t they only read the data they are going to use?
I have tried sharding using:

  options = tf.data.Options()
  options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.AUTO
  dataset = dataset.with_options(options)

and

dataset = dataset.shard(
    input_context.num_input_pipelines, input_context.input_pipeline_id)

But can’t get them to only read/prefetch what they need to train the model…
Is the way I’m doing this not right? And if so what should I do so that each worker doesn’t read the entire dataset?

Topic		Replies	Views
What AutoShardPolicy to use for distributed training with multiple workers? General Discussion datasets , help_request	2	498	June 29, 2022
Can you check my implementation of ParameterServerStrategy TensorFlow distributed-training	0	19	February 13, 2025
Difference between MultiWorkerMirroredStrategy and ParameterServerStrategy General Discussion datasets , help_request	1	956	November 28, 2023
Dataset Service extremely slow with Dynamic sharding policy General Discussion tfdataset , experimental	3	49	July 2, 2024
Need some help to accelerate data retrieval in training pipeline General Discussion api , datasets	4	29	January 10, 2025

Sharding in Parameter Server Strategy

Related topics