tailieunhanh - A Fast Parallel Algorithm for Discovering Frequent Patterns
Department of Computer Science and Information Engineering National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan, . kim-x@ [2] (Apriori-like) approach and 2) the frequent pattern growth approach [6] (FP-growth-like). The Apriori-like methods iteratively generate candidate itemset of size (k+1) from frequent itemset of size k and scan the database repetitively to test the frequency of each candidate itemset. | A Fast Parallel Algorithm for Discovering Frequent Patterns Kawuu w. Lin Department of Computer Science and Information Engineering National Kaohsiung University of Applied Sciences Kaohsiung Taiwan . linwc@ Abstract Fast discovery of frequent patterns is the most extensively discussed problem in data mining fields due to its wide applications. As the size of database increases the computation time and the required memory increase severely. The difficulty of mining large database launched the research of designing parallel and distributed algorithms to solve the problem. Most of the past studies tried to parallelize the computation by dividing the database and distribute the divided database to other nodes for mining. This approach might leak data out and evidently is not suitable to be applied to sensitive domains like health-care. In this paper we propose a novel data mining algorithm named FD-Mine that is able to efficiently utilize the nodes to discover frequent patterns in cloud computing environments with data privacy preserved. Through empirical evaluations on various simulation conditions the proposed FD-Mine delivers excellent performance in terms of scalability and execution time. Keywords Data mining cloud computing association rule mining frequent pattern mining privacy preserved I. Introduction With the progress of information technology data mining techniques have been extensively applied to many applications in various domains. The goal of data mining is to discover the hidden useful information from large databases. The discovered information could help the decision processes aid the commercial promotion and so forth. The data mining includes four main topics association rule mining 2 sequential pattern mining 3 clustering 11 and classification 5 . Among the data mining studies the problem of frequent pattern mining . association rule mining and sequential pattern mining is mostly discussed due to its wide applications. The .
đang nạp các trang xem trước