I found HierarchicalCopyAllReduce is much slower than NcclAllReduce, related issues of multi-Gpus training · Issue #971 · google/automl · GitHub. Any ideas?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Training with multi-gpus can not accelerate | 2 | 431 | December 13, 2022 | |
Significant drop in the model's performance metric (Top K Accuracy) when we go from 1 GPU to 2 or 4 GPUs | 0 | 40 | August 8, 2024 | |
Why `tf.keras.applications` is so slow? | 1 | 868 | July 7, 2021 | |
Allocated CPUs not working with Keras Functional API | 2 | 890 | April 15, 2022 | |
All PerReplica Tensors on device GPU:0, backing_device is correct | 1 | 297 | September 29, 2023 |