Keddah: Capturing Hadoop Network Behaviour

Jie Deng, Gareth Tyson, Felix Cuadrado, Steve Uhlig

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

2 Citations (Scopus)

Abstract

As a distributed system, Hadoop heavily relies on the network to complete data processing jobs. While Hadoop traffic is perceived to be critical for job execution performance, the actual behaviour of Hadoop network traffic is still poorly understood. This lack of understanding greatly complicates research relying on Hadoop workloads. In this paper, we explore Hadoop traffic through experimentation. We analyse the generated traffic of multiple types of MapReduce jobs, with varying input sizes, and cluster configuration parameters. As a result, we present Keddah, a toolchain for capturing, modelling and reproducing Hadoop traffic, for use with network simulators. Keddah can be used to create empirical Hadoop traffic models, enabling reproducible Hadoop research in more realistic scenarios.

Original languageEnglish
Title of host publicationProceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017
EditorsKisung Lee, Ling Liu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2143-2150
Number of pages8
ISBN (Electronic)9781538617915
DOIs
Publication statusPublished - 13 Jul 2017
Externally publishedYes
Event37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017 - Atlanta, United States
Duration: 5 Jun 20178 Jun 2017

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Conference

Conference37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017
Country/TerritoryUnited States
CityAtlanta
Period5/06/178/06/17

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Fingerprint

Dive into the research topics of 'Keddah: Capturing Hadoop Network Behaviour'. Together they form a unique fingerprint.

Cite this