tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 5

Data Mining and Knowledge Discovery Handbook, 2 Edition part 5. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 20 Jonathan I. Maletic and Andrian Marcus process of data cleansing is also laborious time consuming and itself prone to errors. Useful and powerful tools that automate or greatly assist in the data cleansing process are necessary and may be the only practical and cost effective way to achieve a reasonable quality level in existing data. While this may seem to be an obvious solution little basic research has been directly aimed at methods to support such tools. Some related research addresses the issues of data quality Ballou and Tayi 1999 Redman 1998 Wang et al. 2001 and some tools exist to assist in manual data cleansing and or relational data integrity analysis. The serious need to store analyze and investigate such very large data sets has given rise to the fields of Data Mining DM and data warehousing DW . Without clean and correct data the usefulness of Data Mining and data warehousing is mitigated. Thus data cleansing is a necessary precondition for successful knowledge discovery in databases KDD . DATA CLEANSING BACKGROUND There are many issues in data cleansing that researchers are attempting to tackle. Of particular interest here is the search context for what is called in literature and the business world as dirty data Fox et al. 1994 Hernandez and Stolfo 1998 Kimball 1996 . Recently Kim Kim et al. 2003 proposed a taxonomy for dirty data. It is a very important issue that will attract the attention of the researchers and practitioners in the field. It is the first step in defining and understanding the data cleansing process. There is no commonly agreed formal definition of data cleansing. Various definitions depend on the particular area in which the process is applied. The major areas that include data cleansing as part of their defining processes are data warehousing knowledge discovery in databases and data information quality management . Total Data Quality Management TDQM . In the data warehouse user community there is a growing confusion as

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG