tailieunhanh - Data Preparation for Data Mining- P16

Data Preparation for Data Mining- P16: Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500 years ago invented data collection using dried mud tablets marked with tax records, people have been trying to understand the meaning of, and get use from, collected data. More directly, they have been trying to determine how to use the information in that data to improve their lives and achieve their objectives. | Figure Fitting manifolds either inflexible linear regression or flexible neural network to the sample data results in a manifold that in some sense best fits the data. These methods work by creating a mathematical expression that characterizes the state of the fitted line at any point along the line. Studying the nature of the manifold leads to inferences about the data. When predicting values for some particular point linear regression uses the closest point on the manifold to the particular point to be predicted. The characteristics value of the feature to predict of the nearby point on the manifold are used as the desired prediction. Prepared Data and Modeling Algorithms These capsule descriptions review how some of the main modeling algorithms deal with data. The exact problems that working with unprepared data presents for modeling tools will not be reiterated here as they are covered extensively in almost every chapter in this book. The small example data set has no missing values if it had they could not have been plotted. But how does data preparation change the nature of the data The whole idea of course is to give the modeling tools as easy a time as possible when working with the data. When the data is easy to model better models come out faster which is the technical purpose of data preparation. How does data preparation make the data easier to work with Essentially data preparation removes many of the problems. This brief look is not intended to catalog all of the features and benefits of correct data preparation but to give a feel for how it affects modeling. Consider the neural network for example as shown in Figure fitting a flexible manifold to data. One of the problems is that the data points are closer together higher density in the lower-left part of illustrated state space and far less dense in the upper right. Not only must a curve be fitted but the flexibility of the manifold needs to be different in each part of the space. Or

TỪ KHÓA LIÊN QUAN