TY - GEN
T1 - SAMR
T2 - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010
AU - Chen, Quan
AU - Zhang, Daqiang
AU - Guo, Minyi
AU - Deng, Qianni
AU - Guo, Song
PY - 2010
Y1 - 2010
N2 - Hadoop is seriously limited by its MapReduce scheduler which does not scale well in heterogeneous environment. Heterogenous environment is characterized by various devices which vary greatly with respect to the capacities of computation and communication, architectures, memorizes and power. As an important extension of Hadoop, LATE MapReduce scheduling algorithm takes heterogeneous environment into consideration. However, it falls short of solving the crucial problem - poor performance due to the static manner in which it computes progress of tasks. Consequently, neither Hadoop nor LATE schedulers are desirable in heterogeneous environment. To this end, we propose SAMR: a Self-Adaptive MapReduce scheduling algorithm, which calculates progress of tasks dynamically and adapts to the continuously varying environment automatically. When a job is committed, SAMR splits the job into lots of fine-grained map and reduce tasks, then assigns them to a series of nodes. Meanwhile, it reads historical information which stored on every node and updated after every execution. Then, SAMR adjusts time weight of each stage of map and reduce tasks according to the historical information respectively. Thus, it gets the progress of each task accurately and finds which tasks need backup tasks. What's more, it identifies slow nodes and classifies them into the sets of slow nodes dynamically. According to the information of these slow nodes, SAMR will not launch backup tasks on them, ensuring the backup tasks will not be slow tasks any more. It gets the final results of the fine-grained tasks when either slow tasks or backup tasks finish first. The proposed algorithm is evaluated by extensive experiments over various heterogeneous environment. Experimental results show that SAMR significantly decreases the time of execution up to 25% compared with Hadoop's scheduler and up to 14% compared with LATE scheduler.
AB - Hadoop is seriously limited by its MapReduce scheduler which does not scale well in heterogeneous environment. Heterogenous environment is characterized by various devices which vary greatly with respect to the capacities of computation and communication, architectures, memorizes and power. As an important extension of Hadoop, LATE MapReduce scheduling algorithm takes heterogeneous environment into consideration. However, it falls short of solving the crucial problem - poor performance due to the static manner in which it computes progress of tasks. Consequently, neither Hadoop nor LATE schedulers are desirable in heterogeneous environment. To this end, we propose SAMR: a Self-Adaptive MapReduce scheduling algorithm, which calculates progress of tasks dynamically and adapts to the continuously varying environment automatically. When a job is committed, SAMR splits the job into lots of fine-grained map and reduce tasks, then assigns them to a series of nodes. Meanwhile, it reads historical information which stored on every node and updated after every execution. Then, SAMR adjusts time weight of each stage of map and reduce tasks according to the historical information respectively. Thus, it gets the progress of each task accurately and finds which tasks need backup tasks. What's more, it identifies slow nodes and classifies them into the sets of slow nodes dynamically. According to the information of these slow nodes, SAMR will not launch backup tasks on them, ensuring the backup tasks will not be slow tasks any more. It gets the final results of the fine-grained tasks when either slow tasks or backup tasks finish first. The proposed algorithm is evaluated by extensive experiments over various heterogeneous environment. Experimental results show that SAMR significantly decreases the time of execution up to 25% compared with Hadoop's scheduler and up to 14% compared with LATE scheduler.
KW - Heterogeneous environment
KW - MapReduce
KW - Scheduling algorithm
KW - Self-adaptive
UR - https://openalex.org/W2041100566
UR - https://www.scopus.com/pages/publications/78249263228
U2 - 10.1109/CIT.2010.458
DO - 10.1109/CIT.2010.458
M3 - Conference Paper published in a book
SN - 9780769541082
T3 - Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010
SP - 2736
EP - 2743
BT - Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010
Y2 - 29 June 2010 through 1 July 2010
ER -