tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 51

Data Mining and Knowledge Discovery Handbook, 2 Edition part 51. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 480 Swagatam Das and Ajith Abraham Clustering can also be performed in two different modes crisp and fuzzy. In crisp clustering the clusters are disjoint and non-overlapping in nature. Any pattern may belong to one and only one class in this case. In case of fuzzy clustering a pattern may belong to all the classes with a certain fuzzy membership grade Jain et al. 1999 . The most widely used iterative k-means algorithm MacQueen 1967 for partitional clustering aims at minimizing the ICS Intra-Cluster Spread which for k cluster centers can be defined as k ICS C1 C2 . Ck X --mdl2 i 1 XteCt The k-means or hard c-means algorithm starts with k cluster-centroids these centroids are initially selected randomly or derived from some a priori information . Each pattern in the data set is then assigned to the closest cluster-centre. Centroids are updated by using the mean of the associated patterns. The process is repeated until some stopping criterion is met. In the c-medoids algorithm Kaufman and Rousseeuw 1990 on the other hand each cluster is represented by one of the representative objects in the cluster located near the center. Partitioning around medoids PAM Kaufman and Rousseeuw 1990 starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering. Although PAM works effectively for small data it does not scale well for large datasets. Clustering large applications based on randomized search CLARANS Ng and Han 1994 using randomized sampling is capable of dealing with the associated scalability issue. The fuzzy c-means FCM Bezdek 1981 seems to be the most popular algorithm in the field of fuzzy clustering. In the classical FCM algorithm a within cluster sum function Jm is minimized to evolve the proper cluster centers Jm uij m Xj - Vi 2 j 1 i 1 where Vi is the i-th cluster center Xj is the j-th d-dimensional data vector and . is an inner product-induced .

TỪ KHÓA LIÊN QUAN