Đang chuẩn bị liên kết để tải về tài liệu:
Data Preparation for Data Mining- P17

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Data Preparation for Data Mining- P17: Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500 years ago invented data collection using dried mud tablets marked with tax records, people have been trying to understand the meaning of, and get use from, collected data. More directly, they have been trying to determine how to use the information in that data to improve their lives and achieve their objectives. | include points that should otherwise be excluded. Or again in the nearest-neighbor methods neighborhoods were unbalanced. How does preparation help Figure 12.6 shows the data range normalized in state space on the left. The data with both range and distribution normalized is shown on the right. The range-normalized and redistributed space is a toy representation of what full data preparation accomplishes. This data is much easier to characterize manifolds are more easily fitted cluster boundaries are more easily found neighbors are more neighborly. The data is simply easier to access and work with. But what real difference does it make Figure 12.6 Some of the effects of data preparation normalization of data range left and normalization and redistribution of data set right . 12.3.1 Neural Networks and the CREDIT Data Set The CREDIT data set is a derived extract from a real-world data set. Full data preparation and surveying enable the miner to build reasonable models reasonable in terms of addressing the business objective. But what does data preparation alone achieve in this data set In order to demonstrate that we will look at two models of the data one on prepared data and the other on unprepared data. Any difficulty in showing the effect of preparation alone is due to the fact that with ingenuity much better models can be built with the prepared data in many circumstances than with the data unprepared. All this demonstrates however is the ingenuity of the miner To try to level the playing field as it were for this example the neural network models will use all of the inputs have the same number of nodes in the hidden layer and will use no extracted features. There is no change in network architecture for the prepared and unprepared data sets. Thus this uses no knowledge gleaned from the either the data assay or the data survey. Much if not most of the useful information discovered about the data set and how to build better models is simply discarded so that the