Abstract
Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including fairness, fast convergence and stability, due to the mismatch between their objective functions and these properties. Despite being intuitive, integrating these properties into existing learning-based CC is challenging, because: 1) their training environments are designed for the performance optimization of single flow but incapable of cooperative multi-flow optimization, and 2) there is no directly measurable metric to represent these properties into the training objective function. We present Astraea, a new learning-based congestion control that ensures fast convergence to fairness with stability. At the heart of Astraea is a multi-agent deep reinforcement learning framework that explicitly optimizes these convergence properties during the training process by enabling the learning of interactive policy between multiple competing flows, while maintaining high performance. We further build a faithful multi-flow environment that emulates the competing behaviors of concurrent flows, explicitly expressing convergence properties to enable their optimization during training. We have fully implemented Astraea and our comprehensive experiments show that Astraea can quickly converge to fairness point and exhibit better stability than its counterparts. For example, Astraea achieves near-optimal bandwidth sharing (i.e., fairness) when multiple flows compete for the same bottleneck, delivers up to 8.4× faster convergence speed and 2.8× smaller throughput deviation, while achieving comparable or even better performance over prior solutions.
| Original language | English |
|---|---|
| Title of host publication | EuroSys 2024 - Proceedings of the 2024 European Conference on Computer Systems |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 99-114 |
| Number of pages | 16 |
| ISBN (Electronic) | 9798400704376 |
| DOIs | |
| Publication status | Published - 22 Apr 2024 |
| Event | 19th European Conference on Computer Systems, EuroSys 2024 - Athens, Greece Duration: 22 Apr 2024 → 25 Apr 2024 |
Publication series
| Name | EuroSys 2024 - Proceedings of the 2024 European Conference on Computer Systems |
|---|
Conference
| Conference | 19th European Conference on Computer Systems, EuroSys 2024 |
|---|---|
| Country/Territory | Greece |
| City | Athens |
| Period | 22/04/24 → 25/04/24 |
Bibliographical note
Publisher Copyright:© 2024 ACM.
Keywords
- Congestion Control
- Reinforcement Learning
- Transport Protocol