Abstract
In cloud environments, interactive applications deployed in data centers often generate swarms of short-lived data transfers (or flows) that face dramatic competition for the scarce switch buffer space from other short-lived as well as the long-lived flows. In the presence of bloated queues, such short-lived flows often experience multiple packet losses per round-trip time which often triggers the timeout-based loss recovery mechanism. A direct consequence of this is an inflated application response time that turns out to be orders of magnitude larger than what it should be. A data center aware TCP protocol (DCTCP) was designed as a new TCP specifically to address this issue, however, it does not consider its co-existence with other transport protocol (e.g., CuBIC and NewReno of Linux). In such situations, which are abundant in multi-tenant data centers, the legacy large initial congestion window sizes (e.g., 10 segments), induce multiple packet losses at the onset of a TCP flow, which forces timeout and even binary exponential backoff. In this paper, we propose a novel Hypervisor-based, application-transparent approach for active congestion probing to enable the hypervisor to infer on-path congestion before the TCP connection is fully established for new traffic to avoid such massive packet losses and timeout. The so-called ProBoSCIS mechanism does not require any changes to TCP, works with all versions of TCP and does not need any special network hardware features other than those that exist in today's data center commodity switches. We show its effectiveness via ns2 simulation and demonstrate its practical feasibility by implementing and deploying it in a small-scale data center test-bed. We show the significant reduction in application latency by adopting ProBoSCIS in a series of real experiments.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 101-110 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781728125190 |
| DOIs | |
| Publication status | Published - Jul 2019 |
| Event | 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 - Richardson, United States Duration: 7 Jul 2019 → 9 Jul 2019 |
Publication series
| Name | Proceedings - International Conference on Distributed Computing Systems |
|---|---|
| Volume | 2019-July |
Conference
| Conference | 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 |
|---|---|
| Country/Territory | United States |
| City | Richardson |
| Period | 7/07/19 → 9/07/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- Active Probing
- Congestion Control
- Latency
- TCP-ECN