tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 83

Data Mining and Knowledge Discovery Handbook, 2 Edition part 83. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 800 Haixun Wang Philip S. Yu and Jiawei Han Table . Benefits US using Single Classifiers and Classifier Ensembles Original Stream . Chunk G0 G1 E1 G2 E2 G4 E4 G8 E8 12000 201717 203211 197946 253473 211768 269290 215692 289129 6000 103763 98777 101176 121057 102447 138565 106576 143620 4000 69447 65024 68081 80996 69346 90815 70325 96153 3000 43312 41212 42917 59293 44977 67222 46139 71660 Cost-sensitive Learning For cost-sensitive applications we aim at maximizing benefits. In Figure a we compare the single classifier approach with the ensemble approach using the credit card transaction stream. The benefits are averaged from multiple runs with different chunk size ranging from 3000 to 12000 transactions per chunk . Starting from K 2 the advantage of the ensemble approach becomes obvious. In Figure b we average the benefits of Ek and Gk K 2 8 for each fixed chunk size. The benefits increase as the chunk size does as more fraudulent transactions are discovered in the chunk. Again the ensemble approach outperforms the single classifier approach. To study the impact of concept drifts of different magnitude we derive data streams from the credit card transactions. The simulated stream is obtained by sorting the original 5 million transactions by their transaction amount. We perform the same test on the simulated stream and the results are shown in Figure c and d . Detailed results of the above tests are given in Table and . Discussion and Related Work Data stream processing has recently become a very important research domain. Much work has been done on modeling Babcock et al. 2002 querying Babu and Widom 2001 Gao and Wang 2002 Greenwald and Khanna 2001 and mining data streams for instance several papers have been published on classification Domingos and Hulten 2000 Hulten et al. 2001 Street and Kim 2001 regression analysis Chen et al. 2002 and clustering Guha et al. 2000 . Traditional Data Mining algorithms are challenged by two .

TỪ KHÓA LIÊN QUAN