TY - GEN
T1 - Multiple virtual lanes-aware MPI collective communication in multi-core clusters
AU - Li, Bo
AU - Huo, Zhigang
AU - Zhang, Panyong
AU - Meng, Dan
PY - 2009
Y1 - 2009
N2 - The widespread adoption of multi-core processors in supercomputing arena results in multiple processes in one node competing for the limited resources of the network interface. This is especially true for Collective communication in MPI. InfiniBand, as a prevailing high speed network, provides fine-grained Quality of Service (QoS) through Virtual Lanes (VLs) mechanism. In this paper, we study the possibility of enhancing the performance of MPI collective communication by using multiple Virtual Lanes. The utilization of multiple VLs may equalize the priorities of simultaneous send requests, accelerate the transmission of small messages and increase the utilization of network and memory bandwidth. These benefits speed up the MPI Collective communication. Factors that affect the utilization of multiple VLs are disscussed as well. Evaluations show that Alltoall, Reduce, Allreduce and Reduce-scatter operations benefit from our multiple Virtual Lanes aware design with about 10%∼20% performance enhancement. Application evaluations show that our design increases the Fast Fourier Transform performance by 11% in the 1024-core cluster.
AB - The widespread adoption of multi-core processors in supercomputing arena results in multiple processes in one node competing for the limited resources of the network interface. This is especially true for Collective communication in MPI. InfiniBand, as a prevailing high speed network, provides fine-grained Quality of Service (QoS) through Virtual Lanes (VLs) mechanism. In this paper, we study the possibility of enhancing the performance of MPI collective communication by using multiple Virtual Lanes. The utilization of multiple VLs may equalize the priorities of simultaneous send requests, accelerate the transmission of small messages and increase the utilization of network and memory bandwidth. These benefits speed up the MPI Collective communication. Factors that affect the utilization of multiple VLs are disscussed as well. Evaluations show that Alltoall, Reduce, Allreduce and Reduce-scatter operations benefit from our multiple Virtual Lanes aware design with about 10%∼20% performance enhancement. Application evaluations show that our design increases the Fast Fourier Transform performance by 11% in the 1024-core cluster.
KW - Collective communication
KW - InfiniBand
KW - Multicore
KW - Virtual lanes
UR - https://www.scopus.com/pages/publications/77952181368
U2 - 10.1109/HIPC.2009.5433199
DO - 10.1109/HIPC.2009.5433199
M3 - Conference Paper published in a book
AN - SCOPUS:77952181368
SN - 9781424449224
T3 - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
SP - 304
EP - 311
BT - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
T2 - 16th International Conference on High Performance Computing, HiPC 2009
Y2 - 16 December 2009 through 19 December 2009
ER -