I found HierarchicalCopyAllReduce is much slower than NcclAllReduce, related issues of multi-Gpus training · Issue #971 · google/automl · GitHub. Any ideas?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Significant drop in the model's performance metric (Top K Accuracy) when we go from 1 GPU to 2 or 4 GPUs | 0 | 48 | August 8, 2024 | |
Training speed of cnn model is too slow even after using google colab | 2 | 678 | November 16, 2023 | |
Tensorflow 2.17 slow on apple silicon when training neural nets | 3 | 488 | November 13, 2024 | |
TF2 Keras OOM Training ImageNet with MobileNet V2 (4-GPU) | 1 | 1158 | November 15, 2023 | |
Tf.stack is hanging | 1 | 85 | April 2, 2024 |