Impact of distribution strategy on keras SavedModel variables size on disk

When I save a tf.keras model compiled with a MirroredStrategy on a Vertex AI workbench instance with 4 T4 GPUs attached, the resulting SavedModel variables are 3x the size on disk (1.3GB) of a model compiled on the same instance with the default strategy and then saved (430MB).

If I reload the 1.3GB saved model with the default strategy, then resave it, the variables remain at 1.3GB, instead of shrinking to 430MB.

I don’t even have to train the compiled model to see these differences.

I’ve read the guides and tutorials about model saving and loading, and I’m still struggling to understand why this happens. Can anyone shed some light? Is this known behavior?

Hello @jasonbrancazio
Thank you for using TensorFlow

It seems in the saving model part of the issue, the mirror strategy makes replicas of model in all the GPU’s and update the variables for training, while save it would save all the replicas of model in it that is why the model is almost 3x more than on single GPU. In saving the model and loading the model it is documented about this behavior that After restoring the model, you can continue training on it, even without needing to call Model.compile since it was already compiled before saving, and also that is the reason why it loads the same model again.