tailieunhanh - Data mining over large datasets using hadoop in cloud environment

Looping is avoided in retrieving a particular data from huge datasets and it consumes less amount of time for executing the data. HDFS file system is used to store the data after performing the map reduce operations and the execution time is decreased when the number of nodes gets increased. The performance analysis is tuned with the parameters such as the HBase Heap Memory and Caching Parameter. | ISSN:2249-5789 V Nappinna Lakshmi et al, International Journal of Computer Science & Communication Networks,Vol 3(2), 73-78 DATA MINING OVER LARGE DATASETS USING HADOOP IN CLOUD ENVIRONMENT lakshmi 1, N. Revathi2* 1 PG Scholar, 2Assistant Professor Department of Information Technology, Sri Venkateswara College of Engineering, Sriperumbudur – 602105, Chennai, INDIA. 1 Nappinnavenkat@ 2 revathi@* * Corresponding author Abstract- There is a drastic growth of data’s in the web applications and social networking and such data’s are said be as Big Data. The Hive queries with the integration of Hadoop are used to generate the report analysis for thousands of datasets. It requires huge amount of time consumption to retrieve those datasets. It lacks in performance analysis. To overcome this problem the Market Basket Analysis a very popular Data Mining Algorithm is used in Amazon cloud environment by integrating it with Hadoop Ecosystem and Hbase. The objective is to store the data persistently along with the past history of the data set and performing the report analysis of those data set. The main aim of this system is to improve performance through parallelization of various operations such as loading the data, index building and evaluating the queries. Thus the performance analysis is done with the minimum of three nodes with in the Amazon cloud environment. Hbase is a open source, non-relational and distributed database model. It runs on the top of the Hadoop. It consists of a single key with multiple values. Looping is avoided in retrieving a particular data from huge datasets and it consumes less amount of time for executing the data. HDFS file system is used to store the data after performing the map reduce operations and the execution time is decreased when the number of nodes gets increased. The performance analysis is tuned with the parameters such as the HBase Heap Memory and Caching Parameter. Keywords- HBase, Cloud computing, .

TÀI LIỆU MỚI ĐĂNG
6    128    0    26-11-2024
7    120    0    26-11-2024
16    126    1    26-11-2024
3    114    0    26-11-2024
10    116    0    26-11-2024
54    143    1    26-11-2024
19    134    0    26-11-2024