tailieunhanh - Parallel mining for high utility itemsets mining by efficient data structure

In this paper, we introduce the Retail Transaction-Weighted Utility (RTWU) structure and propose two algorithms: EAHUIMiner algorithm and PEAHUI-Miner parallel algorithm. They have been experimented and compared to the two most efficient algorithms: EFIM and FHM. Results show that our algorithm is better with sparse datasets. | Research and Development on Information and Communication Technology Parallel Mining for High Utility Itemsets Mining by Efficient Data Structure Nguyen Manh Hung1 and Dau Hai Phong2 1 Military Technical Academy, Hanoi, Vietnam 2 Thang Long University, Hanoi, Vietnam E-mail: manhhungk12@, phong4u@ Correspondence: Dau Hai Phong Communication: received 5 July 2017, revised 8 August 2017, accepted 16 August 2017 such as HUI-Miner [7], EFIM [8] and FHM [9] attempted to directly search for high utility itemsets in only one phase. Abstract: Mining high utility itemsets in transaction database is an important task in data mining and widely applied in many areas. Recently, many algorithms have been proposed, but most algorithms for identifying high utility itemsets need to generate candidate sets by overestimating their utility and then calculating their exact utility value. Therefore, the number of candidate itemsets is much larger than the actual number of high utility itemsets. In this paper, we introduce the Retail Transaction-Weighted Utility (RTWU) structure and propose two algorithms: EAHUIMiner algorithm and PEAHUI-Miner parallel algorithm. They have been experimented and compared to the two most efficient algorithms: EFIM and FHM. Results show that our algorithm is better with sparse datasets. In 2012, Liu et al. proposed the HUI-Miner algorithm [7] that uses a utility-list structure to store utility information of each itemset and information for reducing search space. Different from previous algorithms, HUI-Miner does not generate candidate high utility itemsets. After constructing the initial utility-lists from a mined database, HUI-Miner can mine high utility itemsets from these utility-lists. In 2015, Zida and Fournier-Viger proposed the EFIM algorithm [8] that relies on two upper-bounds named subtree utility and local utility to more effectively prune the search space. It also introduced a novel array-based utility counting technique .

TỪ KHÓA LIÊN QUAN