tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 66

Data Mining and Knowledge Discovery Handbook, 2 Edition part 66. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 630 Maria Halkidi and Michalis Vazirgiannis that increase or decrease as the number of clusters increase we search for the values of nc at which a significant local change in value of the index occurs. This change appears as a knee in the plot and it is an indication of the number of clusters underlying the data set. Moreover the absence of a knee may be an indication that the data set possesses no clustering structure. Below some representative validity indices for crisp and fuzzy clustering are presented. Crisp Clustering Crisp clustering considers non overlapping partitions meaning that a data point either belongs to a class or not. In this section we discuss validity indices suitable for crisp clustering. The modified Hubert r statistic The definition of the modified Hubert r Theodoridis and Koutroubas 1999 statistic is given by the equation N 1 N r 1 M P i j Q i j i 1 j i 1 where N is the number of objects in a data set M N N 1 2 P is the proximity matrix of the data set and Q is an N x N matrix whose i j element is equal to the distance between the representative points vci vcj of the clusters where the objects Xi and xj belong. Similarly we can define the normalized Hubert o statistic given by equation . 1 m N i 1 P i j vp Q i j to r Op Oq where pP. q op Oq are the respective means and variances of P Q matrices. If the d vCi vCj is close to d xi Xj for i j 1 2 . N P and Q will be in close agreement and the values of r and f normalized r will be high. Conversely a high value of r t indicates the existence of compact clusters. Thus in the plot of normalized r versus nc we seek a significant knee that corresponds to a significant increase of normalized G. The number of clusters at which the knee occurs is an indication of the number of clusters that occurs in the data. We note that for nc 1 and nc N the index is not defined. Dunn family of indices A cluster validity index for crisp clustering proposed in Dunn 1974 aims at the identification of .

TỪ KHÓA LIÊN QUAN