tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 67

Data Mining and Knowledge Discovery Handbook, 2 Edition part 67. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 32 Data Mining Model Comparison Paolo Giudici University of Pavia Summary. The aim of this contribution is to illustrate the role of statistical models and more generally of statistics in choosing a Data Mining model. After a preliminary introduction on the distinction between Data Mining and statistics we will focus on the issue of how to choose a Data Mining methodology. This well illustrates how statistical thinking can bring real added value to a Data Mining analysis as otherwise it becomes rather difficult to make a reasoned choice. In the third part of the paper we will present by means of a case study in credit risk management how Data Mining and statistics can profitably interact. Key words Model choice statistical hypotheses testing cross-validation loss functions credit risk management logistic regression models. Data Mining and Statistics Statistics has always been involved with creating methods to analyse data. The main difference compared to the methods developed in Data Mining is that statistical methods are usually developed in relation to the data being analyzed but also according to a conceptual reference paradigm. Although this has made the various statistical methods available coherent and rigorous at the same time it has also limited their ability to adapt quickly to the methodological requests put forward by the developments in the field of information technology. There are at least four aspects that distinguish the statistical analysis of data from Data Mining. First while statistical analysis traditionally concerns itself with analyzing primary data that has been collected to check specific research hypotheses Data Mining can also concern itself with secondary data collected for other reasons. This is the norm for example when analyzing company data that comes from a data warehouse. Furthermore while in the statistical field the data can be of an experimental nature the data could be the result of an experiment which randomly allocates .

TỪ KHÓA LIÊN QUAN