Đang chuẩn bị liên kết để tải về tài liệu:
Data Mining and Knowledge Discovery Handbook, 2 Edition part 73

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Data Mining and Knowledge Discovery Handbook, 2 Edition part 73. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 700 Vicenc Torra Algorithm 1 Optimal Univariate Microaggregation Data X original data set k integer Result X protected data set 1 begin Let X ai. an be a vector of size n containing all the values for the attribute being protected. Sort the values of X in ascending order so that if i j then ai aj. 2 Given A and k a graph Gk n is defined as follows. 3 begin Define the nodes of G as the elements ai in A plus one additional node go this node is later needed to apply the Dijkstra algorithm . 4 For each node gi add to the graph the directed edges gi gj for all j such that i k j i 2k. The edge gi gj means that the values ai . aj might define one of the possible clusters. 5 The cost of the edge gi gj is defined as the within-group sum of squared error for such cluster. That is SSE X l i ai a 2 where 7 is the average record of the cluster. 6 The optimal univariate microaggregation is defined by the shortest path algorithm between the nodes go and gn . This shortest path can be computed using the Dijkstra algorithm. Algorithm 2 Projected Microaggregation Data X original data set k integer Result X protected data set 1 begin Split the data set X into r sub-data sets Xi i i r each one with ai attributes of the r n records such that ai A i 1 2 foreach Xi e X do 3 Apply a projection algorithm to the attributes in Xi which results in an univariate vector zi with n components one for each record 4 Sort the components of zi in increasing order 5 Apply to the sorted vector zi the following variant of the univariate optimal microaggregation method use the algorithm defining the cost of the edges zi s zi t with s t as the within-group sum of square error for the ai-dimensional cluster in Xi which contains the original attributes of the records whose projected values are in the set Zi s zi s 1 . . . zi t 6 For each cluster resulting from the previous step compute the Vi-dimensional centroid _ and replace all the records in the cluster by the centroid Heuristic approaches for sets of .