tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 99

Data Mining and Knowledge Discovery Handbook, 2 Edition part 99. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 960 Lior Rokach The ensemble methodology is applicable in many fields such as finance Leigh et al. 2002 bioinformatics Tan et al. 2003 healthcare Mangiameli et al. 2004 manufacturing Maimon and Rokach 2004 geography Bruzzone et al. 2004 etc. Given the potential usefulness of ensemble methods it is not surprising that a vast number of methods is now available to researchers and practitioners. This chapter aims to organize all significant methods developed in this field into a coherent and unified catalog. There are several factors that differentiate between the various ensembles methods. The main factors are 1. Inter-classifiers relationship How does each classifier affect the other classifiers The ensemble methods can be divided into two main types sequential and concurrent. 2. Combining method The strategy of combining the classifiers generated by an induction algorithm. The simplest combiner determines the output solely from the outputs of the individual inducers. Ali and Pazzani 1996 have compared several combination methods uniform voting Bayesian combination distribution summation and likelihood combination. Moreover theoretical analysis has been developed for estimating the classification improvement Tumer and Ghosh 1999 . Along with simple combiners there are other more sophisticated methods such as stacking Wolpert 1992 and arbitration Chan and Stolfo 1995 . 3. Diversity generator In order to make the ensemble efficient there should be some sort of diversity between the classifiers. Diversity may be obtained through different presentations of the input data as in bagging variations in learner design or by adding a penalty to the outputs to encourage diversity. 4. Ensemble size The number of classifiers in the ensemble. The following sections discuss and describe each one of these factors. Sequential Methodology In sequential approaches for learning ensembles there is an interaction between the learning runs. Thus it is possible to take advantage of .

TỪ KHÓA LIÊN QUAN