Đang chuẩn bị liên kết để tải về tài liệu:
Data Preparation for Data Mining- P15

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Data Preparation for Data Mining- P15: Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500 years ago invented data collection using dried mud tablets marked with tax records, people have been trying to understand the meaning of, and get use from, collected data. More directly, they have been trying to determine how to use the information in that data to improve their lives and achieve their objectives. | sample is small the miner can establish that details of most of the car models available in the U.S. for the period covered are actually in the data set. Predicting Origin Information metrics Figure 11.16 shows an extract of the information provided by the survey. The cars in the data set may originate from Europe Japan or the U.S. Predicting the cars origins should be relatively easy particularly given the brand of each car. But what does the survey have to say about this data set for predicting a car s origin Report - rCLUSTER OARSD rCARDP.DBF Irpvt Laysr 0 wfh cwrtpul layer 0 With oulpul varteM B Signal HIX B.56OT Rnita O.ilflHB Signal H - AnlKfl.EIITi HIX a 5697 Rafe fl IJMft Chan-wl H 1.2B14 Ralta D.H0B3 Chamal HQOY - Ratio O.B M CM WIH IX -QQOK RfliioflWfflQ Chan-wl H X J B.5697 Ralko C.H6K Oornai MX 1 - 1.2S14 Ralio I .IHMJO Variables ReUllMixtMP to output Variable H X liXiVi Importance BRAND 00000 10000 1KQ0 CU IN 0BM4 U2I15 Q 1902 O3BT3 WT_LDS UNI 0.0771 0L1533 HPWR 08948 0 4110 01383 0 851 CWL 0.E94E 0.6314 0224B3 0.2766 fl.tJbtlS 00567 YEAR 00737 0 9214 06775 00465 ORCIN 3.3 M2 O.OTOO 0.0000 OjOOOO Figure 11.16 Extract of the data survey report for the CARS data set when predicting the cars ORIGIN. Cars may originate from Japan the U.S. or Europe. First of all sH X and sH Y are both fairly close to 1 showing that there is a reasonably good spread of signals in the input and output. The sH Y ratio is somewhat less than 1 and looking at the data itself will easily show that the numbers of cars from each of the originating areas is not exactly balanced. But it is very hard indeed for a miner to look at the actual input states to see if they are balanced whereas the sH X entropy shows clearly that they are. This is a piece of very useful information that is not easily discovered by inspecting the data itself. Looking at the channel measures is very instructive. The signal and channel H X are identical and signal and channel H Y are close. All of the .