Đang chuẩn bị liên kết để tải về tài liệu:
Data Mining and Knowledge Discovery Handbook, 2 Edition part 68
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Data Mining and Knowledge Discovery Handbook, 2 Edition part 68. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 650 Paolo Giudici Table 32.3. Calculations for the threshold chart cutoff accuracy model A Freq. accuracy model B Freq accuracy model C Freq. 95 0 1 0 1 0 1 90 0 1 0 1 0 1 85 0 1 0 1 0 1 80 0 1 0 1 0 1 75 0 1 0 1 0 1 70 0 1 0 1 0 1 65 0 1 0 1 0 1 60 0 1 0 1 0 2 55 0 2 0 1 0 2 50 0.6666666667 6 0 1 0 2 45 0.5714285714 7 0 2 0 2 40 0.6666666667 9 0 4 0 2 35 0.6111111111 18 0 8 0 2 30 0.4642857143 28 0.4230769231 26 0 8 25 0.3902439024 41 0.3673469388 49 0 18 20 0.298245614 57 0.3529411765 51 0.3513513514 37 15 0.2352941176 102 0.2871287129 101 0.2857142857 56 10 0.1833333333 180 0.2402597403 154 0.2364864865 148 5 0.1136363636 396 0.1076555024 418 0.1415384615 325 Fig. 32.2. Threshold charts of the models of which 5 i.e. 83 are bad and 95 i.e. 1556 are good . Looking at model A and considering a cut-off level of 5 notice that the model classifies as bad 396 enterprises. Clearly this figure is higher than the actual number of bad enterprises and consequently the accuracy rate of the model will be low. Indeed of the 396 enterprises estimated as bad only 45 are effectively such and this leads to an accuracy rate of 11.36 for the model. Model A reaches its maximum accuracy for cut off equal to 40 and 50 . Similar conclusions can be drawn for the other two models. To summarize from the Response Threshold Chart we can state that for the examined dataset For low levels of the cut-off i.e. until 15 the highest accuracy rates are those of Reg-3 Model C 32 Data Mining Model Comparison 651 For higher levels of the cut-off between 20 and 55 model A shows a greater accuracy in predicting the occurrence of default bad situations. In the light of the previous considerations it seems natural to ask which of the three is actually the best model. Indeed this question does not have a unique answer the solution depends on the cut-off level retained more opportune to fix in relationship with the business problem at hand. In our case being the default a rare event a low cut-off is .