Abstract
Big data analytics in datacenters often involves scheduling of data-parallel job, which are bottlenecked by limited bandwidth of datacenter networks. To alleviate the shortage of bandwidth, some existing work has proposed traffic compression to reduce the amount of data transmitted over the network. However, their proposed traffic compression works in a coarse-grained manner at job level, leaving a large optimization space unexplored for further performance improvement. In this paper, we propose a flow-level traffic compression and scheduling system, called Swallow, to accelerate data-intensive applications. Specifically, we target on coflows, which is an elegant abstraction of parallel flows generated by big data jobs. With the objective of minimizing coflow completion time (CCT), we propose a heuristic algorithm called Fastest-Volume-Disposal-First (FVDV) and implement Swallow based on Spark. The results of both trace-driven simulations and real experiments show the superiority of our system, over existing algorithms. Swallow can reduce CCT and job completion time (JCT) by up to 1.47 × and 1.66 × on average, respectively, over the SEBF in Varys, one of the most efficient coflow scheduling algorithms so far. Moreover, with coflow compression, Swallow reduces data traffic by up to 48.41% on average.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 505-514 |
| Number of pages | 10 |
| ISBN (Print) | 9781538643686 |
| DOIs | |
| Publication status | Published - 3 Aug 2018 |
| Externally published | Yes |
| Event | 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018 - Vancouver, Canada Duration: 21 May 2018 → 25 May 2018 |
Publication series
| Name | Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018 |
|---|
Conference
| Conference | 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018 |
|---|---|
| Country/Territory | Canada |
| City | Vancouver |
| Period | 21/05/18 → 25/05/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE.
Keywords
- Big Data
- Coflow Scheduling
- Datacenter Networks
- Traffic Compression