I am interested to scale the existing model with a “custom data loader” built on tensorflow.keras.utils.Sequence for multi-GPU support. Can anybody share a few thoughts?
The “custom data loader” is built on tensorflow.keras.utils.Sequence as opposed to tf.dataset because of the nature of the dataset.
Following code is a minimal example.
The above example uses multiprocessing with a "custom data loader " on a single node with multiple CPUs. Is there a way I can scale it for a multi-GPU mirrored strategy with a “custom data loader” like in the example?
I dig a bit but most of the examples in official documentation use tf.dataset for multi-GPU training which makes little complicated to adapt.
I didn’t understand exactly what you want but let me add my 2 cents.
if you want to train the model using multi-gpu, you might look into distribution strategies not on data loaders
To use a distribution strategy, data must be pipelined in a distributed way. Most of the examples shown, used the tf.data API also uses well-known datasets from the TensorFlow datasets. But if the dataset is built from a custom loader like above (using tensorflow.keras.utils.Sequence), then things may change during distributing data across multiple GPUs. I just want to know what’s the right way to do those things.
One way to do this is tf.data.Dataset.from_generator but something not working out
seq_iter_tr = lambda: (s for s in MnistSequence(x_train, y_train, batch_size, 'TRAIN'))
seq_iter_ts = lambda: (s for s in MnistSequence(x_test, y_test, batch_size, 'VAL'))
seq_train = tf.data.Dataset.from_generator(seq_iter_tr,output_signature=(
tf.TensorSpec(shape=(batch_size,28, 28, 1) , dtype=tf.string),
tf.TensorSpec(shape=(batch_size, num_classes), dtype=tf.string )))
seq_test = tf.data.Dataset.from_generator(seq_iter_ts, output_signature=(
tf.TensorSpec(shape=(batch_size,28, 28, 1) , dtype=tf.string),
tf.TensorSpec(shape=(batch_size,num_classes), dtype=tf.string )))