tailieunhanh - John wiley sons data mining techniques for marketing sales_7

Các mô hình là 100% chính xác trên các dữ liệu thử nghiệm không thấy. Nói cách khác, nó đã phát hiện ra các quy tắc chính xác được sử dụng bởi Sâu để phân loại những tuyên bố. Về vấn đề này, một công cụ mạng lưới thần kinh đã không thành công. | 176 Chapter 6 claims were paid automatically. The results were startling The model was 100 percent accurate on unseen test data. In other words it had discovered the exact rules used by Caterpillar to classify the claims. On this problem a neural network tool was less successful. Of course discovering known business rules may not be particularly useful it does however underline the effectiveness of decision trees on rule-oriented problems. Many domains ranging from genetics to industrial processes really do have underlying rules though these may be quite complex and obscured by noisy data. Decision trees are a natural choice when you suspect the existence of underlying rules. Measuring the Effectiveness Decision Tree The effectiveness of a decision tree taken as a whole is determined by applying it to the test set a collection of records not used to build the tree and observing the percentage classified correctly. This provides the classification error rate for the tree as a whole but it is also important to pay attention to the quality of the individual branches of the tree. Each path through the tree represents a rule and some rules are better than others. At each node whether a leaf node or a branching node we can measure The number of records entering the node The proportion of records in each class How those records would be classified if this were a leaf node The percentage of records classified correctly at this node The variance in distribution between the training set and the test set Of particular interest is the percentage of records classified correctly at this node. Surprisingly sometimes a node higher up in the tree does a better job of classifying the test set than nodes lower down. Tests for Choosing the Best Split A number of different measures are available to evaluate potential splits. Algorithms developed in the machine learning community focus on the increase in purity resulting from a split while those developed in the statistics community .

TỪ KHÓA LIÊN QUAN