Load Balancing with Multi-Level Signals for Lossless Datacenter Networks

Jinbin Hu, Chaoliang Zeng, Zilong Wang, Junxue Zhang, Kun Guo, Hong Xu, Jiawei Huang*, Kai Chen*

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

51 Citations (Scopus)

Abstract

Various datacenter network (DCN) load balancing schemes have been proposed in the past decade. Unfortunately, most of these solutions designed for lossy DCNs do not work well for Priority Flow Control (PFC) enabled lossless DCNs, primarily due to the reason that the individual congestion signals used in these solutions, e.g., link load, queue length, Round Trip Time (RTT) and Explicit Congestion Notification (ECN), may not be able to correctly or timely reflect the hop-by-hop PFC pausing. This paper first reveals the above problems via extensive experiments, and then based on the insights learned, we present Proteus, a PFC-aware load balancing scheme that is resilient to PFC pausing by exploring a combination of multi-level congestion signals. At its heart, Proteus leverages RTT-level signals (i.e., RTT and link utilization) to detect path status for initial routing decision, and exploits sub-RTT level signal (i.e., cumulative sojourn time) to reflect instantaneous PFC pausing and make timely rerouting choices based on the idea of better-late-than-never. We have implemented Proteus in the hardware programmable switch. Our testbed experiments as well as large-scale simulations show that Proteus can effectively handle PFC pausing under realistic workloads and achieve up to 35%, 31%, 28%, 22% and 46%, 42%, 34%, 29% better average FCT and 99th percentile FCT than CONGA, DRILL, Hermes and MP-RDMA, respectively.

Original languageEnglish
Pages (from-to)2736-2748
Number of pages13
JournalIEEE/ACM Transactions on Networking
Volume32
Issue number3
DOIs
Publication statusPublished - 1 Jun 2024

Bibliographical note

Publisher Copyright:
© 1993-2012 IEEE.

Keywords

  • Datacenter
  • load balancing
  • lossless networks

Fingerprint

Dive into the research topics of 'Load Balancing with Multi-Level Signals for Lossless Datacenter Networks'. Together they form a unique fingerprint.

Cite this