I found HierarchicalCopyAllReduce is much slower than NcclAllReduce, related issues of multi-Gpus training · Issue #971 · google/automl · GitHub. Any ideas?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Distributed Training | 1 | 373 | July 17, 2023 | |
| Training with multi-gpus can not accelerate | 2 | 446 | December 13, 2022 | |
| MultiWorkerMirroredStrategy | 1 | 1549 | January 2, 2024 | |
| Significant drop in the model's performance metric (Top K Accuracy) when we go from 1 GPU to 2 or 4 GPUs | 0 | 61 | August 8, 2024 | |
| Processing speed with two outputs is faster than with one output in GPU? | 2 | 230 | December 20, 2023 |