Đang chuẩn bị liên kết để tải về tài liệu:
Data Mining and Knowledge Discovery Handbook, 2 Edition part 19

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Data Mining and Knowledge Discovery Handbook, 2 Edition part 19. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 160 Lior Rokach and Oded Maimon However this correction still produces an optimistic error rate. Consequently one should consider pruning an internal node t if its error rate is within one standard error from a reference tree namely Quinlan 1993 e pruned T t S e T S i s T S - 1 - e T S s The last condition is based on statistical confidence interval for proportions. Usually the last condition is used such that T refers to a sub-tree whose root is the internal node t and S denotes the portion of the training set that refers to the node t. The pessimistic pruning procedure performs top-down traversing over the internal nodes. If an internal node is pruned then all its descendants are removed from the pruning process resulting in a relatively fast pruning. 9.6.6 Error-based Pruning EBP Error-based pruning is an evolution of pessimistic pruning. It is implemented in the well-known C4.5 algorithm. As in pessimistic pruning the error rate is estimated using the upper bound of the statistical confidence interval for proportions. T S T S Z J y S where e T S denotes the misclassification rate of the tree T on the training set S. Z is the inverse of the standard normal cumulative distribution and a is the desired significance level. Let subtree Tt denote the subtree rooted by the node t. Let maxchild Tt denote the most frequent child node of t namely most of the instances in S reach this particular child and let St denote all instances in S that reach the node t. The procedure performs bottom-up traversal over all nodes and compares the following values 1. eUB subtree T t St 2. eUB pruned subtree T t t St 3. UB subtree T maxchild T t Smaxchild T t According to the lowest value the procedure either leaves the tree as is prune the node t or replaces the node t with the subtree rooted by maxchild Tt . 9.6.7 Optimal Pruning The issue of finding optimal pruning has been studied in Bratko and Bohanec 1994 and Almuallim 1996 . The first research introduced an algorithm which .