tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 103
Data Mining and Knowledge Discovery Handbook, 2 Edition part 103. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 1000 Vicenc Torra Preprocessing Data Collected data usually contains errors either introduced on purpose . for protecting confidentiality as in privacy preserving Data Mining or due to incorrect data handling . Such errors make data processing a difficult task as incorrect models might be inferred from the erroneous data. This situation is even more noticeable in multi database data mining Wrobel 1997 Zhong et al. 1999 . In such framework models have to be extracted from data distributed among several databases. Then data is usually non consistent attributes in different databases are not codified in a unified way they might have different names and the domain of the attributes is not the same. Information fusion techniques permit to deal with some of these difficulties. We describe below some of the current techniques in use for dealing with these problems. Namely reidentification algorithms in multi database Data Mining fusion and aggregation operators for improving the quality of data for both multi-database and single source database data mining . Re-identification Algorithms In the construction of models from multiple databases re-identification methods play a central role. They are to link those data descriptions that while distributed in different data files belong to the same object. To formalize such methods let us consider a database A and a database B both containing information about the same individuals but being the former described in terms of attributes A1 . An and the latter in terms of attributes Bi . Bm. In this setting we can distinguish two groups of algorithms. They are the following ones Record Linkage or Record Matching Methods Given a record r in A such methods consist on finding all records in B that correspond to the same individual than r. Different methods rely on different assumptions on the attributes Ai and Bi and on the underlying model for the data. Classical methods assume that both files share a large enough set of
đang nạp các trang xem trước