Towards Efficient GPU Interconnect for AI-centric Systems

  • Zhenghang REN

Student thesis: Doctoral thesis

Abstract

The rapid growth of AI applications on GPUs has significantly increased the demand for efficient GPU interconnect. The AI applications rely on the interconnect to transmit large volumes of data during model training, serving, and protecting sensitive information through cryptographic operations. However, existing GPU interconnect suffers from limited bandwidth, in-network congestion, and suboptimal data path. These drawbacks hinder the communication performance in distributed AI applications when data transmission becomes the major bottleneck.

This thesis explores novel solutions to enhance GPU interconnect efficiency during model training, serving, and privacy protection. It makes the following three key contributions: First, we propose FuseLink to maximize GPU communication bandwidth by transmitting data through multiple network interfaces efficiently with both intra- and inter-server connections, mitigating communication bottlenecks in multi-GPU systems. Second, we introduce MCC, a novel congestion control scheme that prevents excessive rate reduction in traditional congestion control algorithms by leveraging message-level congestion signals, improving communication efficiency and resiliency in AI-centric networks. Finally, we present CORA, a high-performance GPU communication framework that incorporates Remote Direct Memory Access (RDMA) with cryptographic primitives, such as secret sharing, enabling low-latency, privacy-preserving model training and serving across GPU clusters.

Together, these contributions advance the state of GPU interconnect protocols, addressing communication efficiency challenges in AI systems.

Date of Award2025
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorKai CHEN (Supervisor)

Cite this

'