Abstract
As a distributed system, Hadoop heavily relies on the network to complete data processing jobs. While Hadoop traffic is perceived to be critical for job execution performance, the actual behaviour of Hadoop network traffic is still poorly understood. This lack of understanding greatly complicates research relying on Hadoop workloads. In this paper, we explore Hadoop traffic through experimentation. We analyse the generated traffic of multiple types of MapReduce jobs, with varying input sizes, and cluster configuration parameters. As a result, we present Keddah, a toolchain for capturing, modelling and reproducing Hadoop traffic, for use with network simulators. Keddah can be used to create empirical Hadoop traffic models, enabling reproducible Hadoop research in more realistic scenarios.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017 |
| Editors | Kisung Lee, Ling Liu |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 2143-2150 |
| Number of pages | 8 |
| ISBN (Electronic) | 9781538617915 |
| DOIs | |
| Publication status | Published - 13 Jul 2017 |
| Externally published | Yes |
| Event | 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017 - Atlanta, United States Duration: 5 Jun 2017 → 8 Jun 2017 |
Publication series
| Name | Proceedings - International Conference on Distributed Computing Systems |
|---|
Conference
| Conference | 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017 |
|---|---|
| Country/Territory | United States |
| City | Atlanta |
| Period | 5/06/17 → 8/06/17 |
Bibliographical note
Publisher Copyright:© 2017 IEEE.