Taming latency in data centers via active congestion-probing

Ahmed M. Abdelmoniem, Brahim Bensaou, Hengky Susanto

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

6 Citations (Scopus)

Abstract

In cloud environments, interactive applications deployed in data centers often generate swarms of short-lived data transfers (or flows) that face dramatic competition for the scarce switch buffer space from other short-lived as well as the long-lived flows. In the presence of bloated queues, such short-lived flows often experience multiple packet losses per round-trip time which often triggers the timeout-based loss recovery mechanism. A direct consequence of this is an inflated application response time that turns out to be orders of magnitude larger than what it should be. A data center aware TCP protocol (DCTCP) was designed as a new TCP specifically to address this issue, however, it does not consider its co-existence with other transport protocol (e.g., CuBIC and NewReno of Linux). In such situations, which are abundant in multi-tenant data centers, the legacy large initial congestion window sizes (e.g., 10 segments), induce multiple packet losses at the onset of a TCP flow, which forces timeout and even binary exponential backoff. In this paper, we propose a novel Hypervisor-based, application-transparent approach for active congestion probing to enable the hypervisor to infer on-path congestion before the TCP connection is fully established for new traffic to avoid such massive packet losses and timeout. The so-called ProBoSCIS mechanism does not require any changes to TCP, works with all versions of TCP and does not need any special network hardware features other than those that exist in today's data center commodity switches. We show its effectiveness via ns2 simulation and demonstrate its practical feasibility by implementing and deploying it in a small-scale data center test-bed. We show the significant reduction in application latency by adopting ProBoSCIS in a series of real experiments.

Original languageEnglish
Title of host publicationProceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages101-110
Number of pages10
ISBN (Electronic)9781728125190
DOIs
Publication statusPublished - Jul 2019
Event39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 - Richardson, United States
Duration: 7 Jul 20199 Jul 2019

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2019-July

Conference

Conference39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
Country/TerritoryUnited States
CityRichardson
Period7/07/199/07/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • Active Probing
  • Congestion Control
  • Latency
  • TCP-ECN

Fingerprint

Dive into the research topics of 'Taming latency in data centers via active congestion-probing'. Together they form a unique fingerprint.

Cite this