tailieunhanh - A comprehensive view of hadoop mapreduce scheduling algorithms
The most common objective of scheduling algorithms is to minimize the completion time of a parallel application and also achieve to these issues. in this paper, we describe the overview of Hadoop MapReduce and their scheduling issues and problems. then, we have studies of most popular scheduling algorithms in this field. finally, highlighting the implementation Idea, advantages and disadvantage of these algorithms. | International Journal of Computer Networks and Communications Security C VOL. 2, NO. 9, SEPTEMBER 2014, 308–317 Available online at: ISSN 2308-9830 N C S A Comprehensive View of Hadoop MapReduce Scheduling Algorithms Seyed Reza Pakize Department of Computer, Islamic Azad University, Yazd Branch, Yazd, Iran E-mail: ABSTRACT Hadoop is a Java-based programming framework that supports the storing and processing of large data sets in a distributed computing environment and it is very much appropriate for high volume of data. it's using HDFS for data storing and using MapReduce to processing that data. MapReduce is a popular programming model to support data-intensive applications using shared-nothing clusters. the main objective of MapReduce programming model is to parallelize the job execution across multiple nodes for execution. nowadays, all focus of the researchers and companies toward to Hadoop. due this, many scheduling algorithms have been proposed in the past decades. there are three important scheduling issues in MapReduce such as locality, synchronization and fairness. The most common objective of scheduling algorithms is to minimize the completion time of a parallel application and also achieve to these issues. in this paper, we describe the overview of Hadoop MapReduce and their scheduling issues and problems. then, we have studies of most popular scheduling algorithms in this field. finally, highlighting the implementation Idea, advantages and disadvantage of these algorithms. Keywords: Hadoop, Map Reduce, Locality, Scheduling algorithm, Synchronization, Fairness. 1 INTRODUCTION Hadoop is much more than a highly available, massive data storage engine. One of the main advantages of using Hadoop is that you can combine data storage and processing [1]. it can provide much needed robustness and scalability option to a distributed system as Hadoop provides inexpensive and reliable storage. Hadoop using HDFS for data
đang nạp các trang xem trước