Best way to choose steps_per_execution?

river_shah · December 21, 2022, 7:07am

I have a few questions about the steps_per_execution argument in the Keras compile method:

Why should this argument not always be set to a very high number?
What impact does setting steps_per_execution to a high number have on memory, CPU, and device resource utilization?
Are there any concerns about model accuracy when using a very high steps_per_execution, or will models with different steps_per_execution values always converge to the same metrics? (In contrast, very large batch sizes can negatively impact model performance, as discussed in this discussion and paper.)
For distributed strategies such as TPUStrategy, is there any concern about setting a very large steps_per_execution? When do the gradient all-reduces happen across pod devices when using large steps_per_execution values? Does the optimizer.apply_gradients behavior change with large steps_per_execution values?

Kiran_Sai_Ramineni · July 24, 2024, 8:25am

Hi @river_shah, Generally steps_per_execution is used to control how many batches of data need to be executed per epoch.

When the steps_per_execution argument is set to a very high number the model takes a long time per epoch and also increases the time for updating the optimizer state. The best way for choosing this value is total no,of samples in the dataset//batch_size. By using this all the data points present in the dataset will be covered.

while training all the batches required per steps will be loaded into the memory which will increase the memory utilization if there are more batches.

with very high steps_per_execution, the gradient accumulation may behave differently, potentially leading to less stable training

All-reduce will happen once all the devices have completed their execution.

Thank You

Topic		Replies	Views
Why should I set 'steps_per_epoch' manually? General Discussion datasets	5	2159	September 2, 2023
Effective batch size using tf.distribute.MirroredStrategy Keras distributed-training , keras	3	637	September 19, 2023
How to train a model with huge data and limited GPU memory using tf.data.Dataset APIs Keras models , gpu	5	603	July 14, 2023
Train_on_batch and train_step used in custom training loop giving different results Keras models , keras , help_request	1	1240	October 15, 2024
What does the run_eagerly parameter in model.compile do? General Discussion distributed-training , keras , education , help_request	11	12558	June 16, 2021

Best way to choose steps_per_execution?

Related topics