Significant drop in the model's performance metric (Top K Accuracy) when we go from 1 GPU to 2 or 4 GPUs

Hi everyone…as the title says, the performance of the model drops when I use a cluster of GPUs.
The (custom) training is being done in vertex training service.
This is the image I am using: us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-9:latest
These are the machine types: a2-ultragpu-1g, a2-ultragpu-2g, a2-ultragpu-4g each with 1, 2 and 4 GPUs respectively.
I’m following this tutorial:

This is my implementation of the strategy:

tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice(reduce_to_device=“cpu:0”))

I increased the batch size in 2 and 4 GPUS
batch_size_2GPUs = batch_size_1GPUx2
batch_size_4GPUs = batch_size_1GPUx4

Is there anything else I need to do, at the code level, to have the same performance values in each case?
Thanks in advance.

1 Like