tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 82

Data Mining and Knowledge Discovery Handbook, 2 Edition part 82. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 790 Haixun Wang Philip S. Yu and Jiawei Han Incremental or online Data Mining methods Utgoff 1989 Gehrke et al. 1999 are another option for mining data streams. These methods continuously revise and refine a model by incorporating new data as they arrive. However in order to guarantee that the model trained incrementally is identical to the model trained in the batch mode most online algorithms rely on a costly model updating procedure which sometimes makes the learning even slower than it is in batch mode. Recently an efficient incremental decision tree algorithm called VFDT is introduced by Domingos et al Domingos and Hulten 2000 . For streams made up of discrete type of data Hoeffding bounds guarantee that the output model of VFDT is asymptotically nearly identical to that of a batch learner. The above mentioned algorithms including incremental and online methods such as VFDT all produce a single model that represents the entire data stream. It suffers in prediction accuracy in the presence of concept drifts. This is because the streaming data are not generated by a stationary stochastic process indeed the future examples we need to classify may have a very different distribution from the historical data. In order to make time-critical predictions the model learned from the streaming data must be able to capture transient patterns in the stream. To do this as we revise the model by incorporating new examples we must also eliminate the effects of examples representing outdated concepts. This is a non-trivial task. The challenge of maintaining an accurate and up-to-date classifier for infinite data streams with concept drifts including the following Accuracy. It is difficult to decide what are the examples that represent outdated concepts and hence their effects should be excluded from the model. A commonly used approach is to forget examples at a constant rate. However a higher rate would lower the accuracy of the up-to-date model as it is supported by a less .

TỪ KHÓA LIÊN QUAN